How does OCR work?


The engine receives the document image. The first part of the process is to cut the picture into smaller elements and extract the parts where the characters are. The small elements are then compared to potential characters that match the extracted patterns. Each potential character is ranked by attributing a confident index. This index reflected the probability of the character being the one identified from the picture. After the comparison of possible characters is done the character with the highest confident index is chosen.

Then the identified characters are reassemble into words, sentences and paragraph to reconstruct the document text. For some uses the result from the OCR engine return the character and its coordinates within the document image, to create a double layer PDF with the superposition of text on top of the image for example.