Speech to Text using Google Cloud Speech API

Do you want to convert your Speech to Text? Google Cloud provides a Speech-to-Text API that allows you to generate a transcript from your audio file(say WAV).

Google Cloud allows you to choose 2 types of media input – Microphone and File upload. In this article, we will take a second option. We will upload the audio file of WAV format, send this file to Google Cloud, receive transcripts, and download it.

Speech-to-Text has three main methods to perform speech recognition.

  • Synchronous Recognition: These requests are limited to audio data of 1 minute or less in duration.
  • Asynchronous Recognition: Use these requests for audio data of any duration up to 480 minutes.
  • Streaming Recognition: Streaming recognition provides interim results while audio is being captured, allowing results to appear, for example, while a user is still speaking.

For our tutorial, we will write a code for Asynchronous Recognition.

That being said, let’s take a look at how to convert speech to text using Google Cloud in PHP.

Create a Service Account

In order to interact with the Google Speech API, you need to download the credentials of your service account. Follow the steps below to get the JSON file containing your credentials.

  • In the Cloud Console, go to the Create service account page.
  • Create or select a project.
  • Enable the Cloud Speech-to-Text API for that project.
  • Create a service account.
  • Download a private key as JSON.

Next, set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON file.

Google Cloud supports a large number of languages for speech recognition of the supplied audio. The language code must be a BCP-47 identifier. Get a list of supported languages and their BCP-47 code here. Pick up your language code from the list.

Speech to Text using Google Cloud Speech API

Head over to the project directory and run the command below to install the Google Cloud Speech API library.

composer require google/cloud-speech

I am using sample.wav file in the English language(en-US). In your case, adjust these parameters. Use the below code which will interact with the Cloud API and perform the conversion operation.

<?php
require_once 'vendor/autoload.php';

use Google\Cloud\Speech\V1\SpeechClient;
use Google\Cloud\Speech\V1\RecognitionAudio;
use Google\Cloud\Speech\V1\RecognitionConfig;
use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;

try {
    $audioFile = __DIR__.'/sample.wav';

    // change these variables if necessary
    $encoding = AudioEncoding::LINEAR16;
    $languageCode = 'en-US';

    // get contents of a file into a string
    $content = file_get_contents($audioFile);

    // set string as audio content
    $audio = (new RecognitionAudio())
        ->setContent($content);

    // set config
    $config = (new RecognitionConfig())
        ->setEncoding($encoding)
        ->setLanguageCode($languageCode);

    // create the speech client
    $client = new SpeechClient();

    // create the asyncronous recognize operation
    $operation = $client->longRunningRecognize($config, $audio);
    $operation->pollUntilComplete();

    if ($operation->operationSucceeded()) {
        $response = $operation->getResult();

        // each result is for a consecutive portion of the audio. iterate
        // through them to get the transcripts for the entire audio file.
        $final_transcript = '';
        foreach ($response->getResults() as $result) {
            $alternatives = $result->getAlternatives();
            $mostLikely = $alternatives[0];
            $final_transcript .= $mostLikely->getTranscript();
        }

        // download a file
        $file = "transcript.txt";
        $txt = fopen($file, "w") or die("Unable to open file!");
        fwrite($txt, $final_transcript);
        fclose($txt);

        header('Content-Description: File Transfer');
        header('Content-Disposition: attachment; filename='.basename($file));
        header('Expires: 0');
        header('Cache-Control: must-revalidate');
        header('Pragma: public');
        header('Content-Length: ' . filesize($file));
        header("Content-Type: text/plain");
        readfile($file);
        exit();
    } else {
        print_r($operation->getError());
    }

    $client->close();
} catch(Exception $e) {
    echo $e->getMessage();
}

Run this code and you should get your transcription file downloaded automatically.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

4 thoughts on “Speech to Text using Google Cloud Speech API

  1. This dude literally grab the code from Google documentation.

    Just like “you use this code and everything work”, but won’t as there’s no authentication to gcloud API.

Leave a Reply

Your email address will not be published. Required fields are marked *