Speech-To-Text using Amazon Transcribe in PHP

Recently I worked on a project where I was introduced to the Amazon Transcribe service. The client wanted the feature of converting speech to text programmatically. After some research, we found Amazon Transcribe is the best fit for such tasks. Amazon Transcribe uses a machine learning model called automatic speech recognition (ASR) to convert audio to text. Some of the benefits of using Amazon Transcribe are:

  • Get insights from customer conversations
  • Search and analyze media content
  • Create subtitles and meeting notes
  • Improve clinical documentation

Depending on your business model, you can utilize the outcome of this service to grow your business.

In this article, we study how to convert speech to text using Amazon Transcribe in PHP. The AWS provides the SDK for PHP which we are going to use for this tutorial.

Getting Started

To get started, you should have an AWS account. Login with your AWS account and grab your security credentials. We will require these credentials in the latter part of the tutorial. Our PHP script will communicate with the AWS services via these credentials.

AWS Credentials

After this, install the AWS SDK PHP library using the Composer command:

composer require aws/aws-sdk-php

For converting the speech to text through Amazon Transcribe, you must have a supported media file. Allowed media formats are mp3 | mp4 | wav | flac | ogg | amr | webm. In addition to this, your speech should be in the supported languages. You can see the list of language codes in their documentation.

The integration of Amazon Transcribe involves the following steps.

  • Upload the media file on S3 Bucket.
  • Instantiate an Amazon Transcribe Client.
  • Start a Transcription job with Amazon Transcribe. This transcription job requires the media URL of the S3 object and the unique job id.
  • Amazon Transcribe service may take a few minutes to finish the translation process. You have to wait for it.
  • Download the text file after AWS completes the transcription job.

Let’s see how to handle this flow by writing the actual PHP code.

Speech-To-Text using Amazon Transcribe in PHP

First, create the HTML form to browse the media file. Upon form submission, we take the media file for further processing and send a translated text back to the browser in the .txt file format.

<form method="post" enctype="multipart/form-data">
    <p><input type="file" name="media" accept="audio/*,video/*" /></p>
    <input type="submit" name="submit" value="Submit" />
</form>

On the PHP end, you have to send the media file to the AWS service. We first upload this media file on the S3 bucket and then initiate the translation task. Include the AWS environment in your application.

<?php
require 'vendor/autoload.php';
 
use Aws\S3\S3Client;
use Aws\TranscribeService\TranscribeServiceClient;

// process the media file

Next, upload the media file on the S3 bucket and grab the S3 URL of the uploaded media.

if ( isset($_POST['submit']) ) {
     // Check if media file is supported
    $arr_mime_types = ['mp3', 'mp4', 'wav', 'flac', 'ogg', 'amr', 'webm'];
    $media_format = pathinfo($_FILES['media']['name'])['extension'];
    if ( !in_array($media_format, $arr_mime_types) ) {
        die('File type is not allowed');
    }
 
    // pass AWS API credentials
    $region = 'PASS_REGION';
    $access_key = 'ACCESS_KEY';
    $secret_access_key = 'SECRET_ACCESS_KEY';
    
    // Specify S3 bucket name
    $bucketName = 'PASS_BUCKET_NAME';
 
    $key = basename($_FILES['media']['name']);
 
    // upload file on S3 Bucket
    try {
        // Instantiate an Amazon S3 client.
        $s3 = new S3Client([
            'version' => 'latest',
            'region'  => $region,
            'credentials' => [
                'key'    => $access_key,
                'secret' => $secret_access_key
            ]
        ]);

        $result = $s3->putObject([
            'Bucket' => $bucketName,
            'Key'    => $key,
            'Body'   => fopen($_FILES['media']['tmp_name'], 'r'),
            'ACL'    => 'public-read',
        ]);
        $audio_url = $result->get('ObjectURL');
 
        // Code for Amazon Transcribe Service - Start here

    }  catch (Exception $e) {
        echo $e->getMessage();
    }
}

Make sure to replace the placeholders with the actual values. The S3 URL of the uploaded media will be sent to the Amazon Transcribe service. To initiate a transcription job, we require a unique job id to differentiate multiple jobs. This unique id can be created using the uniqid() method.

// Create Amazon Transcribe Client
$awsTranscribeClient = new TranscribeServiceClient([
    'region' => $region,
    'version' => 'latest',
    'credentials' => [
        'key'    => $access_key,
        'secret' => $secret_access_key
    ]
]);

// Start a Transcription Job
$job_id = uniqid();
$transcriptionResult = $awsTranscribeClient->startTranscriptionJob([
        'LanguageCode' => 'en-US',
        'Media' => [
            'MediaFileUri' => $audio_url,
        ],
        'TranscriptionJobName' => $job_id,
]);

$status = array();
while(true) {
    $status = $awsTranscribeClient->getTranscriptionJob([
        'TranscriptionJobName' => $job_id
    ]);

    if ($status->get('TranscriptionJob')['TranscriptionJobStatus'] == 'COMPLETED') {
        break;
    }

    sleep(5);
}

// delete s3 object
$s3->deleteObject([
    'Bucket' => $bucketName,
    'Key' => $key,
]);

// download the converted txt file

In the above code, we instantiate Amazon Transcribe Client and start the Transcription job. It may take a few mins to complete the translation. I have handled it inside the while loop using the sleep() method. I am checking if the process is completed every 5 seconds. If it is completed, I am breaking the loop.

Once the translation job is over, we don’t need the media file anymore so we’re deleting it.

You can see this Transcription process on the AWS dashboard under the Amazon Transcribe => Transcription jobs as shown in the screenshot below.

Transcription Job

Once the translation is ready, download it into the text format using the below code.

$url = $status->get('TranscriptionJob')['Transcript']['TranscriptFileUri'];
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
if (curl_errno($curl)) {
    $error_msg = curl_error($curl);
    echo $error_msg;
}
curl_close($curl);
$arr_data = json_decode($data);

// Send converted txt file to a browser
$file = $job_id.".txt";
$txt = fopen($file, "w") or die("Unable to open file!");
fwrite($txt, $arr_data->results->transcripts[0]->transcript);
fclose($txt);

header('Content-Description: File Transfer');
header('Content-Disposition: attachment; filename='.basename($file));
header('Expires: 0');
header('Cache-Control: must-revalidate');
header('Pragma: public');
header('Content-Length: ' . filesize($file));
header("Content-Type: text/plain");
readfile($file);
exit();

This code sends the translated text file to the browser which will be downloaded automatically.

Final Sample Code

The code written above is in chunks. The final sample code is as follows. You can just copy it and use it in your application.

<?php
set_time_limit(0);

require 'vendor/autoload.php';
 
use Aws\S3\S3Client;
use Aws\TranscribeService\TranscribeServiceClient;

if ( isset($_POST['submit']) ) {
    // Check if media file is supported
    $arr_mime_types = ['mp3', 'mp4', 'wav', 'flac', 'ogg', 'amr', 'webm'];
    $media_format = pathinfo($_FILES['media']['name'])['extension'];
    if ( !in_array($media_format, $arr_mime_types) ) {
        die('File type is not allowed');
    }
 
    // pass AWS API credentials
    $region = 'PASS_REGION';
    $access_key = 'ACCESS_KEY';
    $secret_access_key = 'SECRET_ACCESS_KEY';
    
    // Specify S3 bucket name
    $bucketName = 'PASS_BUCKET_NAME';
 
    // Instantiate an Amazon S3 client.
    $s3 = new S3Client([
        'version' => 'latest',
        'region'  => $region,
        'credentials' => [
            'key'    => $access_key,
            'secret' => $secret_access_key
        ]
    ]);
 
    $key = basename($_FILES['media']['name']);
 
    // upload file on S3 Bucket
    try {
        $result = $s3->putObject([
            'Bucket' => $bucketName,
            'Key'    => $key,
            'Body'   => fopen($_FILES['media']['tmp_name'], 'r'),
            'ACL'    => 'public-read',
        ]);
        $audio_url = $result->get('ObjectURL');
 
        // Code for Amazon Transcribe Service - Start here
        // Create Amazon Transcribe Client
        $awsTranscribeClient = new TranscribeServiceClient([
            'region' => $region,
            'version' => 'latest',
            'credentials' => [
                'key'    => $access_key,
                'secret' => $secret_access_key
            ]
        ]);
        
        // Start a Transcription Job
        $job_id = uniqid();
        $transcriptionResult = $awsTranscribeClient->startTranscriptionJob([
                'LanguageCode' => 'en-US',
                'Media' => [
                    'MediaFileUri' => $audio_url,
                ],
                'TranscriptionJobName' => $job_id,
        ]);
        
        $status = array();
        while(true) {
            $status = $awsTranscribeClient->getTranscriptionJob([
                'TranscriptionJobName' => $job_id
            ]);
        
            if ($status->get('TranscriptionJob')['TranscriptionJobStatus'] == 'COMPLETED') {
                break;
            }
        
            sleep(5);
        }

        // delete s3 object
        $s3->deleteObject([
            'Bucket' => $bucketName,
            'Key' => $key,
        ]);
        
        // download the converted txt file
        $url = $status->get('TranscriptionJob')['Transcript']['TranscriptFileUri'];
        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_HEADER, false);
        $data = curl_exec($curl);
        if (curl_errno($curl)) {
            $error_msg = curl_error($curl);
            echo $error_msg;
        }
        curl_close($curl);
        $arr_data = json_decode($data);
        
        // Send converted txt file to a browser
        $file = $job_id.".txt";
        $txt = fopen($file, "w") or die("Unable to open file!");
        fwrite($txt, $arr_data->results->transcripts[0]->transcript);
        fclose($txt);
        
        header('Content-Description: File Transfer');
        header('Content-Disposition: attachment; filename='.basename($file));
        header('Expires: 0');
        header('Cache-Control: must-revalidate');
        header('Pragma: public');
        header('Content-Length: ' . filesize($file));
        header("Content-Type: text/plain");
        readfile($file);
        exit();
    }  catch (Exception $e) {
        echo $e->getMessage();
    }
}
?>

<form method="post" enctype="multipart/form-data">
    <p><input type="file" name="media" accept="audio/*,video/*" /></p>
    <input type="submit" name="submit" value="Submit" />
</form>

Conclusion

We have seen the Amazon Transcribe service that can be used to convert speech to text. To get the job done you should pass your speech in the supported media format. Then PHP script written above will give you translated text.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

4 thoughts on “Speech-To-Text using Amazon Transcribe in PHP

  1. Would be great to see an example with a Queue like SQS. Let the user wait in front of a PHP script running sleep(5) in an endless loop is not very friendly.

Leave a Reply

Your email address will not be published. Required fields are marked *