Extract Text From An Image using Amazon Textract and PHP

In the past, I wrote articles on how to extract text from an image. I had discussed achieving this goal using Tesseract OCR and Google Cloud Vision.

In this article, I’ll discuss the Amazon Textract service which also allows you to extract text from an image. Reading the text of the image is sometimes necessary for you. Possibly, you want to detect if the text is abusive, or not appropriate to the audience. And you don’t want the manual process of detecting text. The robust solution is to build a program and automate your workflow.

Usually, OCR(optical character recognition) software like Tesseract OCR is used for this purpose. But this kind of software requires manual configurations to be done on your computer.

The Amazon Textract extracts text using a machine learning technique. It’s a better choice over the software as it runs on the cloud and you don’t need to keep an eye on updating software and configurations.

That being said, let’s study how one can extract text from an image using PHP and Amazon Textract.

Get Your AWS Security Credentials

Amazon provides the SDK for PHP applications. With this SDK, we’ll incorporate the Textract service which then interacts with the AWS.

To use the AWS SDK, you should have an account on AWS. Upon creating an account, get the security credentials that are required for AWS SDK integration. On the basis of your credentials, SDK performs the operations against your AWS account.

AWS Credentials

Extract Text From An Image

Let’s assume you want to read the text of the following image. I’ll save this image as 1.jpeg to my local system.

Note: The Textract service only allows JPEG and PNG images. If you pass other image formats then you’ll get an error of Request has unsupported document format.

dummy image

Next, install the AWS SDK for PHP using the Composer command as follows.

composer require aws/aws-sdk-php

Upon installing the library, include the AWS environment and initialize TextractClient as follows.

<?php
require 'vendor/autoload.php';
use Aws\Textract\TextractClient;

$textractClient = new TextractClient([
    'version' => 'latest',
    'region' => 'us-west-2', // pass your region
    'credentials' => [
        'key'    => 'ACCESS_KEY_ID',
        'secret' => 'ACCESS_KEY_SECRET'
    ]
]);

Now, to read the text you need to pass the content of the image in the request format suggested by AWS.

try {
    $result = $textractClient->detectDocumentText([
        'Document' => [
            'Bytes' => file_get_contents(getcwd().'/1.jpg'),
        ]
    ]);
    foreach ($result->get('Blocks') as $block) {
        if ($block['BlockType'] != 'WORD') continue;
        
        echo $block['Text']." ";
    }
} catch (Aws\Textract\Exception\TextractException $e) {
    // output error message if fails
    echo $e->getMessage();
}

Run this file and you will get the output as ‘How TO SEND EMAIL USING GMAIL API WITH PHPMAILER’.

Here, I am printing the Text from the WORD Block objects. If you use print_r($result), you will see the LINE and WORD Block objects. You can use either Block objects to print the text.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

2 thoughts on “Extract Text From An Image using Amazon Textract and PHP

    1. You must pass JPEG, PNG, and PDF formats. For other formats it throws an error of Request has unsupported document format.

Leave a Reply

Your email address will not be published. Required fields are marked *