Extract Text From An Image using Amazon Textract and PHP

In the past, I wrote articles on how to extract text from the image. I had discussed achieving this goal using Tesseract OCR and Google Cloud Vision.

In this article, I’ll discuss the Amazon Textract service which also allows you to extract text from an image. Reading the text of the image is sometimes necessary for various purposes. Possibly, you want to detect if the text is abusive, or not appropriate to the audience. You can’t do the manual process of detecting text. The robust solution is to build a program and automate your workflow.

Usually, OCR(optical character recognition) software like Tesseract OCR is used for this purpose. But this kind of software requires manual configurations to be done.

The Amazon Textract extracts text using a machine learning technique. It’s a better choice over the software as it runs on the cloud and you don’t need to keep an eye on updating software and configurations.

That being said, let’s study how one can extract text from an image using PHP and Amazon Textract.

Get Your AWS Security Credentials

Amazon provides the SDK for PHP applications. With this SDK, we’ll incorporate the Textract service which then interacts with the AWS.

To use the AWS SDK, you should have an account on AWS. Upon creating an account, get the security credentials that we will require for AWS SDK integration. On the basis of your credentials, SDK performs the operations against your AWS account.

AWS Credentials

Extract Text From An Image

Let’s assume we want to read the text of the following image. I’ll save this image as 1.jpg to my local system.

youtube-thumbnail-phpmailer

Next, install the AWS SDK for PHP using the Composer by running the below command.

composer require aws/aws-sdk-php

Upon installing the library, include the AWS environment and initialize TextractClient as follows.

<?php
require 'vendor/autoload.php';
use Aws\Textract\TextractClient;

$textractClient = new TextractClient([
    'version' => 'latest',
    'region' => 'us-west-2', // pass your region
    'credentials' => [
        'key'    => 'ACCESS_KEY_ID',
        'secret' => 'ACCESS_KEY_SECRET'
    ]
]);

Now, to read the text you need to pass the content of the image in the given request format suggested by AWS.

try {
    $result = $textractClient->detectDocumentText([
        'Document' => [
            'Bytes' => file_get_contents(getcwd().'/1.jpg'),
        ]
    ]);
    foreach ($result->get('Blocks') as $block) {
        if ($block['BlockType'] != 'WORD') continue;
        
        echo $block['Text']." ";
    }
} catch (Aws\Textract\Exception\TextractException $e) {
    // output error message if fails
    echo $e->getMessage();
}

Run this file and you will get the output as ‘How TO SEND EMAIL USING GMAIL API WITH PHPMAILER’.

Here, I am printing the Text from the WORD Block objects. If you use print_r($result), you will see the LINE and WORD Block objects. You can use either Block objects to print the text.

Related Articles

If you liked this article, then please subscribe to our YouTube Channel for video tutorials.

Leave a Reply

Your email address will not be published.