ocrs icon indicating copy to clipboard operation
ocrs copied to clipboard

Rectify text lines before recognition

Open robertknight opened this issue 1 year ago • 2 comments

Ocrs does not currently apply any perspective correction to extracted text lines before applying recognition. The recognition model is trained to handle skewed and rotated inputs, but this only works for moderate rotation. Text lines with significant rotation will have their characters squashed in the vertical direction during preprocessing, as recognition inputs have a fixed height of 64px. This harms recognition accuracy.

The library should rectify line images before recognition to better handle rotated/skewed inputs.

robertknight avatar Feb 27 '24 09:02 robertknight

Example of an image where this comes up (source):

slide

Text line images from the slides currently look like this when input to the recognition model (see output of ocrs image.jpeg --text-line-images):

If the line were rectified first, the accuracy should improve a lot.

robertknight avatar Mar 01 '24 15:03 robertknight

Reference implementation using OpenCV's image transform functions.

Usage:

python rectify.py slide.jpeg line.png '1105,316;1630,458;1622,498;1105,356' 517,64

Note the coordinate order is clockwise from top left. This produces line.png:

line

From the rectified image, Ocrs is able to correctly extract the text, whereas from the original the output is garbage.

robertknight avatar Mar 02 '24 09:03 robertknight