Smart Text Detection and Manipulation in Images
Everyday Web Developers are looking for better ways to enhance user experience when it comes to images. Often, when dealing with user uploaded or 3rd party images, there could be sensitive information which would call for the need to manipulate text embedded as content in an image. Car registration numbers, identity cards, road signs, and commercials are some of the possible scenarios in which you may need to manipulate the text content of an image.
The requirement gets even more interesting when applied to more advanced scenarios, where you translate text written in a foreign language.
Manually this seems simple but if you have hundreds of images being regularly, automation is necessary for efficiency. Services like Cloudinary can simplify the complex process involving image text extraction and manipulation. Cloudinary is a cloud-based, end-to-end image and video management service. Storage, manipulation, transformation and media delivery is what Cloudinary knows how to do best. The wide range of manipulations includes character recognition, extraction, and manipulation of text in images.
Optical character recognition (OCR) available as an add-on is powered by Google Vision API.
OCR for Manipulation
The first thing we want to attempt with OCR is manipulating an image based on the characters found in the image. For example, on a real estate website, you may want to hide the agent's contact details. Though you may be able to restrict agents from displaying in their contact information, they may discover different ways to leave these details in the image, like shown below:http://res.cloudinary.com/demo/image/upload/w_1.1/home_4_sale.jpg
The sign clearly shows the agent’s phone number, which might violate your terms and conditions. With OCR, you can replace the text with your own contact information.
To achieve this with Cloudinary, we need to use three parameters:
1. The overlay image: The image on which we intend to cover the detected text.
2. Set Gravity to text_ocr
for correct positioning
3. fl_region_relative
to adjust the width of the overlay image to that of the detected text element.
http://res.cloudinary.com/demo/image/upload/l_call_text,fl_region_relative,w_1.1,g_ocr_text/home_4_sale.jpg
To replicate the above example, you'll need a free Cloudinary account. Once you have created your account, upload the image above to your provisioned cloud, and start manipulating the image URL as we're doing above.
Of course, you can use an SDK to achieve this. Here’s an example using the JavaScript SDK to deliver a transformed image:
js
cloudinary.image("home_4_sale.jpg", {overlay: "call_text",
flags: "region_relative",
width: "1.1",
gravity: "ocr_text"})
Rather than overlaying the text in the image with another image, we could also blur the text if we don’t want to display any contact information on the image.
Using the same example:
js
http://res.cloudinary.com/demo/image/upload/e_pixelate_region:15,w_1.1,g_ocr_text/home_4_sale.jpg
So, instead of using an overlay, we are setting the e_pixelate_region
to blur the region with 15 being the level of blur applied. Notice that g_ocr_text
is still there to specify the OCR instruction.
OCR for Text Extraction
Another common use case is retrieving the text detected in the image. The extracted text can then be further analyzed to fit the user's need. You can retrieve this text while uploading or updating an image stored on your Cloudinary server. Let's upload the following Pexel image to Cloudinary and extract the text found on the image: To get started with doing this, you need to create an account on Cloudinary. Once you have an account, there will be a cloud provisioned for you to store your images and transform them as you wish. You also will be handed your API credentials, which include the cloud name, API key, and secret. Retrieve these credentials and store them safely. Next, you will need to enable the OCR add-on by going to your add-on settings and clicking the free option under the OCR add-on configurations. Next, create a simple Node environment by running: ```bashCreate a Node project
npm init --yAdd an entry point
touch index.js ``` We also create anindex.js
entry point for the example. Before heading right into this file, we need to install the Cloudinary SDK:
bash
npm install --save cloudinary
You can now head back into the index.js
entry file and configure a Cloudinary instance to connect to your Cloudinary cloud:
```js
const cloudinary = require('cloudinary');
cloudinary.config({
cloudname: 'CLOUDNAME',
apikey: 'APIKEY',
apisecret: 'APISECRET'
});
```
You're all set to start uploading images while trying to retrieve the text content in the images. Here is how:
js
cloudinary.v2.uploader.upload("https://static.pexels.com/photos/164542/pexels-photo-164542.jpeg",
{ ocr: "adv_ocr" },
function(error, result) {
if(error) {
console.log(error);
return
}
console.log(result.info.ocr.adv_ocr.data[0].textAnnotations[0].description)
});
Basically, the upload method is used to send images to your Cloudinary server. But if you want the upload process to retrieve the text contents while uploading as a response, you need to to set the ocr
option to adv_ocr
.
Run the app with the following command and watch the output in the console:
bash
node index.js
The image we uploaded prints the following in the console:
This text can be used as per your requirements.