How to Read Text from Image in PHP

Do you want to know how to read text from an image in PHP? There are certain scenarios where you want to extract the text written on an image programmatically. Probably you want to check whether the text on an image is abusive, to recognize the image from text, etc. In this article, we study how to programmatically read text from an image in PHP.

Tesseract OCR is an open-source OCR engine that allows us to detect text in the image. The user can install the Tesseract OCR engine on OS like Windows, macOS, and Linux. You will get the installation instructions on their documentation. In this tutorial, I’ll explain the installation of Tesseract OCR on Linux and Windows machines.

The alternate ways of the reading text of the image are via Google Cloud Vision and Amazon Textract. Both are cloud services and do not require installing anything on your machine except their PHP library. Though it’s paid service if you want to give it a try follow the linked articles.

Installation of Tesseract OCR Engine on Windows

First, download the tesseract installer for Windows. Choose the installer for a 32-bit or 64-bit system based on your machine configuration. Complete the installation process.

Once you installed Tesseract OCR on your Windows OS, set the path C:\Program Files\Tesseract-OCR in your environment variable. After setting the path it is recommended to restart your system. Sometimes it does not take effect until you restart the system.

With Tesseract OCR, you can read the text in various languages. All you need to do is download the required language file from this location. Let’s say you want to read text written in the German language. Download the deu.traineddata file from the linked page, and keep it inside C:/Program Files/Tesseract-OCR/tessdata.

Install Tesseract OCR on Linux

Using the apt command-line utility one can easily install the Tesseract OCR on the Linux distribution. The below command will install tesseract under the usr/share/tesseract-ocr/4.00/tessdata.

sudo apt install tesseract-ocr

This command will install the English language pack. To install the additional language, you need to pass the language code to the command below. Here, I am installing the German language which has a code deu.

sudo apt install tesseract-ocr-deu

Read Text from Image in PHP

Next, install the Tesseract OCR library in your PHP project. For this, run the command below from your project root directory.

composer require thiagoalessio/tesseract_ocr

You are ready with Tesseract OCR software and its library. Now, you can easily detect text in the image. It just requires a few lines of PHP code. Let’s say you want to read the content of the below image.

Text

Place this image in the images directory of your project. To read the text of this image your PHP code will be as follow:

<?php
require_once "vendor/autoload.php";

use thiagoalessio\TesseractOCR\TesseractOCR;

try {
    echo (new TesseractOCR('images/text.png'))
        ->run();
} catch(Exception $e) {
    echo $e->getMessage();
}

The final output should be as follows:

The quick brown fox jumps over the lazy dog.

For reading the text written in another language, pass the language code to the lang() method as shown below.

echo (new TesseractOCR('IMAGE_PATH'))
    ->lang('deu')
    ->run();

That’s it! It is that simple. I hope you got to know how to read text from the image in PHP. I would like to hear your thoughts and suggestions in the comment section below.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

Leave a Reply

Your email address will not be published.