Convert Text to Speech using Google Cloud and PHP

In the past, I have written an article Text-To-Speech using Amazon Polly in PHP. Recently while working on the Google Cloud platform, I found Google also provides a similar kind of service to convert your text to voice. In this article, we study how to convert text to speech using Google Cloud and PHP.

Google Cloud Text-to-Speech

“Text-to-Speech allows developers to create natural-sounding, synthetic human speech as playable audio.”

This API can be used in applications that play human speech in audio for users. It allows you to convert your string, words, and sentences into the sound of a person(male/female).

Text-to-Speech service takes two types of input: raw text or SSML-formatted data.

When you pass a simple raw text, the service creates raw audio data of natural, human speech.

You can enhance the speech produced by Text-to-Speech using Speech Synthesis Markup Language(SSML). It allows you to insert pauses, and pronunciation into the audio data. In order to use SSML, you need to wrap your text into the SSML-provided tags.

For example, I am using the below SSML-formatted text for this tutorial. I am adding a pause for 3 seconds in between the speech.

<speak>
    Japan's national soccer team <break time="3s" /> won against Colombia!
</speak>

The <speak> is the root element and <break> element used to control pausing. There are more SSML tags available that you can add for modifying your speech. Read more about it in their documentation.

While generating an audio file, you can also choose the voice of a person along with the language. Here is the list of Supported voices and languages. While writing a code, we will use these parameters.

Create a Service Account on Google Cloud

To interact with the Text-to-speech API, you are required to create a service account in the cloud console. Follow the steps below. You can also get instructions on this page.

  • In the Cloud Console, go to the Create service account page.
  • Create a project. You can also select an existing one.
  • Enable the Cloud Text-to-Speech API for the project.
  • Create a service account.
  • Download a private key as JSON.

Next, set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of this downloaded JSON file.

On the Windows system, you can do as follows.

For command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH

For PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Replace KEY_PATH with the actual path of the JSON file.

On Linux or macOS, you can set it using:

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

Convert Text to Voice using Google Cloud Text-to-Speech API

Google provides different packages to interact with their APIS. For this Text-to-Speech API install the package using the command below.

composer require google/cloud-text-to-speech

Next, we are going to write a code for both raw text and SSML-formatted data. Let’s first do it for raw text.

<?php
require_once 'vendor/autoload.php';

use Google\Cloud\TextToSpeech\V1\AudioConfig;
use Google\Cloud\TextToSpeech\V1\AudioEncoding;
use Google\Cloud\TextToSpeech\V1\SynthesisInput;
use Google\Cloud\TextToSpeech\V1\TextToSpeechClient;
use Google\Cloud\TextToSpeech\V1\VoiceSelectionParams;

try {
    $textToSpeechClient = new TextToSpeechClient();

    $input = new SynthesisInput();
    $input->setText('Japan\'s national soccer team won against Colombia!');
    $voice = new VoiceSelectionParams();
    $voice->setLanguageCode('en-US');
    
    // optional
    $voice->setName('en-US-Standard-C');

    $audioConfig = new AudioConfig();
    $audioConfig->setAudioEncoding(AudioEncoding::MP3);

    $resp = $textToSpeechClient->synthesizeSpeech($input, $voice, $audioConfig);
    
    $resultData = $resp->getAudioContent();
    header('Content-length: ' . strlen($resultData));
    header('Content-Disposition: attachment; filename="text-to-speech.mp3"');
    header('X-Pad: avoid browser bug');
    header('Cache-Control: no-cache');
    echo $resultData;

    $textToSpeechClient->close();
} catch(Exception $e) {
    echo $e->getMessage();
}

In the above code, I took the reference of Supported voices and languages to set the values for setLanguageCode and setName methods. I chose the English(US) and Female voice name(en-US-Standard-C) respectively. You can adjust these parameters as per your requirement.

Run the above code on the browser and you will get the MP3 file of your raw text.

In the case of SSML, you just need to replace setText with setSsml and to this method pass the SSML-formatted data.

$input->setSsml('<speak>
                Japan\'s national soccer team <break time="3s" /> won against Colombia!
            </speak>');

Here in the SSML language, I added a pause of 3 seconds. You can format your text following the SSML guidelines.

I hope you understand how to convert text to speech using Google Cloud and PHP. Please share your thoughts and suggestions in the comment section below.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *