How to Read Text from Image in PHP

Do you want to read text from an image in PHP? There are certain scenarios where you want to programmatically extract the text written on an image. Maybe you need to check whether the text on an image is abusive or not. You may have to recognize the image from text, or whatever reason. It can be done using the OCR(Optical Character Recognition) technique. It’s a technology that recognizes text within an Image. In this article, we study how to programmatically read text from an image in PHP.

Tesseract OCR is an open-source OCR engine that allows us to detect text in the image. It’s quite popular and has been in service for a long period(introduced around 1985).

The Tesseract OCR engine is available for all popular OS like Windows, macOS, and Linux. You will get the installation instructions on their documentation. In this tutorial, I’ll explain the installation of Tesseract OCR on Linux and Windows machines.

The alternate ways of the reading text of the image are using Google Cloud Vision and Amazon Textract. Both are cloud services and do not require installing anything on your machine except their PHP library. Though it’s paid service if you want to give it a try follow the linked articles.

Install Tesseract OCR on Windows

To get started download the tesseract installer for Windows. Choose the installer for a 32-bit or 64-bit system based on your machine configuration. Complete the installation process as prompted.

Once you installed Tesseract OCR on your Windows machine, set the path C:\Program Files\Tesseract-OCR in your environment variable. Follow the steps below to set the environment path.

  • Open File Explorer. Right-click on This PC and choose Properties.
  • From the right sidebar, click on Advanced system settings.
  • It’ll open a small window. On this, click on the Environment Variable button.
  • Under the System variable, edit the Path and add a new value C:\Program Files\Tesseract-OCR to it.

After setting the environment path restart your system. The Tesseract OCR might not run until you restart the system.

The Tesseract OCR helps you to read the text in various languages. All you need to do is download the required language file from this location. Let’s say you want to read text written in the German language. For this, download the deu.traineddata file from the linked page, and keep it inside C:/Program Files/Tesseract-OCR/tessdata. You can store as many language files as you wish.

Install Tesseract OCR on Linux

One can easily install the Tesseract OCR on the Linux distribution using the apt command-line utility. The below command will install tesseract under the usr/share/tesseract-ocr/4.00/tessdata.

sudo apt install tesseract-ocr

This command will install the package of English language. To install the additional language, you need to pass the language code to the command below. Here, I am installing the German language which has a code deu.

sudo apt install tesseract-ocr-deu

Read Text From Image in PHP

So you are done with the installation of Tesseract OCR. Now, we can integrate it in PHP. For this, install the Tesseract OCR library in your project. You have to run the command below from your project root directory.

composer require thiagoalessio/tesseract_ocr

With this library, you can easily detect text in the image. It just requires a few lines of PHP code. Let’s say you want to read the content of the below image.

Text

Place this image inside the images directory of your project. To read the text of this image your PHP code will be as follow:

<?php
require_once "vendor/autoload.php";

use thiagoalessio\TesseractOCR\TesseractOCR;

try {
    echo (new TesseractOCR('images/text.png'))
        ->run();
} catch(Exception $e) {
    echo $e->getMessage();
}

It’ll print the output as follows:

The quick brown fox jumps over the lazy dog.

For reading the text written in another language, pass the language code to the lang() method as shown below. Here, I am reading German(deu) language.

echo (new TesseractOCR('IMAGE_PATH'))
    ->lang('deu')
    ->run();

That’s it! It is that simple. I hope you got to know how to read text from the image in PHP. Feel free to share your thoughts and suggestions in the comment section below.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

Leave a Reply

Your email address will not be published. Required fields are marked *