How to Convert Live Speech to Text using JavaScript

Recently one of the readers asked about converting live speech to text. The topic sounds interesting to me. Though I wrote about converting speech to text using Amazon Transcribe and Google Cloud Speech, these services require passing audio files. These audio files are then converted into text.

But here the question is for live speech to text. So I decided to explore the solution and came across the Web Speech API. It provides 2 functionality – speech recognition, and speech synthesis. The speech recognition is used to get the text from the speech.

Speech recognition receives speech from your device’s microphone. The word or phrase is checked by a speech recognition service and then returned as a text string.

In this tutorial, we’ll convert live speech to text using Web Speech API and additionally create a PDF of this speech.

Note that Web Speech API is currently supported on a limited browser. You can use this service on the latest version of Chrome or Safari.

Getting Started

To see the flow in action, I’ll create the HTML with a few elements. We’ll have 2 buttons – Start and Stop to initiate and end speech recognition. When you click on the Start button, it first asks for permission to use the microphone. Once you give the permission, you can start speaking to your microphone. The words will start printing in HTML as you speak.

For ending the speech recognition, simply click the Stop button. As soon as you click it, a new button Save to PDF will appear. This button will convert your speech to PDF and send it to the browser.

Create the index.html file and add the following code to it.

<p>
	<input type="button" name="start" value="Start" class="start" />
</p>
<p>
	<input type="button" name="stop" value="Stop" class="stop" />
</p>
<div class="transcript"></div>
<div class="btn-pdf" style="display:none;">
	<button onclick="save_pdf()">Save to PDF</button>
</div>

<script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script>
<script src="custom.js"></script>

Here, I am not adding any styling to the elements. The main purpose is to build the actual functionality. The design part will differ for each user.

I have included html2canvas and jspdf libraries via CDN into the HTML. These libraries generate the PDF out of HTML provided to it. It also has custom.js where we write the actual code for speech recognition and PDF generation.

In the HTML, I’ve added a div container with the class transcript. The text string of a speech will append inside this div container in the runtime.

Convert Live Speech to Text

At first, we must check browser compatibility for speech recognition and alert the user if it’s not supported.

if ("webkitSpeechRecognition" in window) {
	// actual code here
} else {
	alert("speech recognition API not supported");
}

Next, we have to create an object of the class SpeechRecognition. This class has few properties to interact with.

  • continuous: If you want to continuously convert speech while speaking, set this property to true. It keeps speech recognition on until you explicitly end it.
  • start: This property initiates the speech recognition service.
  • stop: As the name suggests, it terminates the speech recognition process.
const SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;

const recognition = new SpeechRecognition();

recognition.continuous = true;

document.querySelector(".start").onclick = () => {
	document.querySelector(".btn-pdf").style.display = "none";
	recognition.start();
};
document.querySelector(".stop").onclick = () => {
	document.querySelector(".btn-pdf").style.display = "block";
	recognition.stop();
};

When you are talking to the microphone, Web Speech API starts recognizing words or phrases which need to catch and print on the page. For this, we have to use the onresult property of the SpeechRecognition class.

let transcript = "";
recognition.onresult = (event) => {
	for (let i = event.resultIndex; i < event.results.length; i++) {
    	if (event.results[i].isFinal) {
        	transcript += event.results[i][0].transcript;
    	}
    	document.querySelector(".transcript").innerHTML = transcript;
	}
};

This code receives the text string runtime and keeps appending text to the specified div container. The process continues until you hit the Stop button.

Convert Speech to PDF

Once you are done with the process you might want to convert speech to PDF for offline use. To generate the PDF out of your text string, write the below code into the save_pdf() method.

function save_pdf() {
	window.jsPDF = window.jspdf.jsPDF;

	var doc = new jsPDF();

	// Source HTMLElement or a string containing HTML.
	var elementHTML = document.querySelector(".transcript");

	doc.html(elementHTML, {
    	callback: function(doc) {
        	// Save the PDF
        	doc.save('speech.pdf');
    	},
    	x: 15,
    	y: 15,
    	width: 170, //target width in the PDF document
    	windowWidth: 650 //window width in CSS pixels
	});
}

It takes all content from the div having a class transcript and passes it to the jspdf library which then generates the PDF.

The final code of the custom.js file will be as follows.

if ("webkitSpeechRecognition" in window) {
	const SpeechRecognition = window.SpeechRecognition || webkitSpeechRecognition;

	const recognition = new SpeechRecognition();

	recognition.continuous = true;

	let transcript = "";
	recognition.onresult = (event) => {
    	for (let i = event.resultIndex; i < event.results.length; i++) {
        	if (event.results[i].isFinal) {
            	transcript += event.results[i][0].transcript;
        	}
        	document.querySelector(".transcript").innerHTML = transcript;
    	}
	};

	document.querySelector(".start").onclick = () => {
    	document.querySelector(".btn-pdf").style.display = "none";
    	recognition.start();
	};
	document.querySelector(".stop").onclick = () => {
    	document.querySelector(".btn-pdf").style.display = "block";
    	recognition.stop();
	};   
} else {
	alert("speech recognition API not supported");
}

function save_pdf() {
	window.jsPDF = window.jspdf.jsPDF;

	var doc = new jsPDF();

	// Source HTMLElement or a string containing HTML.
	var elementHTML = document.querySelector(".transcript");

	doc.html(elementHTML, {
    	callback: function(doc) {
        	// Save the PDF
        	doc.save('speech.pdf');
    	},
    	x: 15,
    	y: 15,
    	width: 170, //target width in the PDF document
    	windowWidth: 650 //window width in CSS pixels
	});
}

You’re done with converting live speech to text using JavaScript. Give it a try and let me know your thoughts in the comment section below.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *