RecognitionOCRPL

The RecognitionOCRPL action library performs full page recognition on images.

Details

Page level recognition can be performed which can then be processed by actions in other action libraries and ultimately displayed for verification or exported to external repositories or file systems.

Machine print can be recognized. Cursive text and hand print are not supported.

It is strongly recommended that you refer document Best Practices for optimal text recognitionto understand how to get the best results from recognition.

Languages Supported for OCR

The expected language must be set for correct recognition. Not specifying the languages on the page could reduce the quality of recognition.
Important: When setting the language parameter, make sure that the language is spelled exactly as written below and without spaces. An invalid language name causes the action to abort.
  • English
  • German
  • Dutch
  • French
  • Italian
  • Latin
  • Norwegian
  • Portuguese
  • Spanish
  • Swedish
  • Swiss

The language can be bound to the DCO object by selecting it in the OCR/PL tab in the zones tab of Datacap Studio. When selected in the OCR/PL tab the variable pl_language is set to the language. The language also can be set within rules using the rrSet action to set the pl_language variable to the desired language.

By default, Raw reading is used as a language.

For example: rrSet("German", "@X.ocr_lg") sets the language to German.

Setting Engine TimeOut

Timeout in seconds (s) for processing the image. This is not a hard limit. When the timeout is reached, the engine will stop to process further pages or text lines and the action will return false. In this case, try to increase the timeouts as per the number of images.

Timeout value can be set using the variable pl_timeOut

For example: rrSet("40", "@X.pl_timeOut") sets the timeout to 40 sec.

The CCO File

The CCO is a representation of a recognized text on a page and includes character confidence information. Each recognized character has a unique confidence score that is an indication from the engine regarding the perceived accuracy of the recognition. The CCO is required for many Datacap operations such as using the click-n-key feature of a verify panel. Actions that operate on full-page text also require the CCO file, such as Locate actions. The RecognizeOCRPL action Recognize performs recognition and create layout file. Created layout file that must be converted to a CCO using the action CreateCcoFromLayout in the SharedRecognitionTools action library. Once the CCO is created, then subsequent features that require the CCO can be used.
Note: The images should be preprocessed before ingesting into the Recognize action.

Refer to "Improving Image Quality for Better Recognition" from "Best Practices for optimal text recognition in IBM Datacap" in case any image enhancement is required such as rotate, etc.

All the configuration parameters provided in this action library are case-sensitive.

OCRPL engine processing logs

OCRPL engine processing logs can be generated by setting the variable pl_PathToLog to the appropriate path.

Example

rrSet( "C:\OCRPL_Logs" , "@X.pl_PathToLog")

In this example, after processing, OCRPL engine related logs are saved in the configured locationC:\OCRPL_Logs.