Considerations about OCR accuracy
On this topic, you can see what affects the quality of the Optical Character Recognition (OCR) results that the providers return.
Full text search
Full Text Search is a technique used to complement the IBM RPA Studio OCR APIs. This technique uses the indexing of words within a text field to examine all words in the document, while trying to match the search criteria. This mechanism is allied to the OCR, it makes the search among the documents much more agile and precise.
What affects OCR accuracy?
OCR results are not always reliable, and can change based on various factors. The following table shows what affects OCR accuracy.
Item that affects OCR accuracy | Description |
---|---|
Image quality | Lower quality images provide less accurate results. |
Text direction | Horizontal text input have higher chance of having OCR providers return words as output, while vertical text make OCR providers return letters on each line. |
Legibility | Small font sizes and handwriting might reduce OCR accuracy. |
Colors | Switching between colorful and grayscale image input might affect OCR results. |
Image background | Text above colorful backgrounds might reduce OCR accuracy. |
Image resolution | Lower resolution images reduce OCR accuracy. The suggested pixel density for images that you want to parse with OCR is 300 dpi. |
Algorithm version | Different versions of the OCR provider may provide different results even when scanning the same document. |