Optical character recognition (OCR) is a technology that allows machines to recognize and convert printed or handwritten text into digital form. It has become an important part of many industries, including finance, healthcare, and education. OCR can be used to automate data entry, improve document management, and enhance the accessibility of information.

In this article, we will explore how to perform OCR using PaddleOCR, a popular OCR library in Python. It is fast and accurate library that can process image for text detection and recognition.

Installing PaddleOCR

PaddleOCR is an open-source OCR toolkit developed by PaddlePaddle, an AI framework. It supports a wide range of languages and is optimized for both accuracy and speed. To install PaddleOCR, you can use pip:

pip install paddlepaddle paddleocr

Once you have installed PaddleOCR, you can import it into your Python script. In this tutorial, we will use both Text Detection and Recognition using PaddleOCR.

Text Detection

Text detection involves locating the region in an image that contains text. PaddleOCR provides a set of pre-trained models for text detection, which can be used to detect text in images. To perform text detection, we need to first create an OCR object using the PaddleOCR API. We can then use the ocr method to detect text in an image. Let's see how we can do this.

from paddleocr import PaddleOCR

# Declare the PaddleOCR class
ocr = PaddleOCR()

img_path ='./test.png'

# Perform prediction
result = ocr.ocr(img_path, rec=False)
print(result)
[
  [[[73.0, 377.0], [590.0, 377.0], [590.0, 401.0], [73.0, 401.0]], 
  [[73.0, 348.0], [641.0, 348.0], [641.0, 368.0], [73.0, 368.0]], 
  [[71.0, 316.0], [688.0, 317.0], [688.0, 338.0], [71.0, 337.0]], 
  [[72.0, 228.0], [205.0, 230.0], [204.0, 279.0], [71.0, 276.0]], 
  [[72.0, 162.0], [636.0, 168.0], [636.0, 218.0], [71.0, 211.0]], 
  [[74.0, 104.0], [670.0, 104.0], [670.0, 146.0], [74.0, 146.0]], 
  [[71.0, 53.0], [516.0, 53.0], [516.0, 76.0], [71.0, 76.0]]]
]

This method returns a list of lists, where each list corresponds to a text region detected in the image. Now, we can use this text coorindates and draw on image or extract a specific part of image using points. 

In this example, we will first convert values from float to int, then we can read an image using opencv-python and can draw box around selected text or crop image for a box.

import cv2
import numpy as np

# read image using opencv python
image = cv2.imread('./test.png')

for row in result[0]:
    # conver float to integer
    row = [[int(r[0]), int(r[1])] for r in row]
    # Draw polyline on image
    cv2.polylines(image, [np.array(row)], True, (255, 0, 0), 1)

cv2.imwrite("out.jpg", image)
img01

Text Recognition

Text recognition involves recognizing the text in an image. PaddleOCR provides pre-trained models for text recognition, which can be used to recognize text in an image. To perform text recognition, we need to first create an OCR object using the PaddleOCR API. We can then use the ocr method to recognize the text in an image. Let's see how we can do this.

result = ocr.ocr("./test.png")
print(result)

The ocr method returns a list of list, where each list corresponds to a line of text detected in the image. Each list contains the following data:

  • coordinates: Coordinates of recognized text
  • confidence: The confidence score of the recognized text
  • text: The recognized text

Check example output for code above.

[
  [
    [[[71.0, 53.0], [516.0, 53.0], [516.0, 76.0], [71.0, 76.0]], ('Introducing Affiliate Template for Webflow', 0.9333747029304504)], 
    [[[74.0, 104.0], [670.0, 104.0], [670.0, 146.0], [74.0, 146.0]], ('Easy-to-use Template', 0.9670184850692749)], 
    [[[72.0, 162.0], [636.0, 168.0], [636.0, 218.0], [71.0, 211.0]], ('for building Affiliate', 0.9301523566246033)], 
    [[[72.0, 228.0], [205.0, 230.0], [204.0, 279.0], [71.0, 276.0]], ('Sites', 0.9647798538208008)], 
    [[[71.0, 316.0], [688.0, 317.0], [688.0, 338.0], [71.0, 337.0]], ('Affiliate is a Webflow template made for entrepreneurs whc', 0.9189010858535767)], 
    [[[73.0, 348.0], [641.0, 348.0], [641.0, 368.0], [73.0, 368.0]], ('Want a professional and polished site ready to start and', 0.9063881039619446)], 
    [[[73.0, 377.0], [590.0, 377.0], [590.0, 401.0], [73.0, 401.0]], ('grow their affiliate marketing business in any niche', 0.9410164952278137)]
  ]
]

As we can see in output above that each list contains text as well as confidance of a detection with their coodinates. Now we can also write text for each detection on image using opencv-python. Check example below.

import cv2
import numpy as np

image = cv2.imread('./test.png')
for row in result[0]:
    bbox = [[int(r[0]), int(r[1])] for r in row[0]]
    
    # write text output on image at each line
    cv2.putText(image, row[1][0], bbox[0], cv2.FONT_HERSHEY_COMPLEX_SMALL, 0.8, (255, 0, 0), 1)
    cv2.polylines(image, [np.array(bbox)], True, (255, 0, 0), 1)

# save image
cv2.imwrite("out_with_text.jpg", image)

In output below we can see text for each line drawn on image.

img01

Or, we can have text concatenated for each detection on new line.

concat_output = "\n".join(row[1][0] for row in result[0])
print(concat_output)
Introducing Affiliate Template for Webflow
Easy-to-use Template
for building Affiliate
Sites
Affiliate is a Webflow template made for entrepreneurs whc
Want a professional and polished site ready to start and
grow their affiliate marketing business in any niche

Conclusion

In this article, we have explored how to perform OCR using PaddleOCR in Python. We have seen how to detection recognize text in an image and draw bounding box and output text on image from the recognized text. OCR is a powerful technology that can automate many manual tasks and improve the accessibility of information. With PaddleOCR, performing OCR in Python has become easier than ever.

For more details on PaddleOCR, visit their official documentation.

https://github.com/PaddlePaddle/PaddleOCR