The RecognitionOCRPL action library performs full page recognition on images.
Page level recognition can be performed which can then be processed by actions in other action libraries and ultimately displayed for verification or exported to external repositories or file systems.
Machine print can be recognized. Cursive text and hand print are not supported.
It is strongly recommended that you refer document Best Practices for optimal text recognitionto understand how to get the best results from recognition.
Languages Supported for OCR
The language can be bound to the DCO object by selecting it in the OCR/PL tab in the zones tab of Datacap Studio. When selected in the OCR/PL tab the variable pl_language is set to the language. The language also can be set within rules using the rrSet action to set the pl_language variable to the desired language.
By default, Raw reading is used as a language.
For example: rrSet("German", "@X.ocr_lg") sets the language to German.
Setting Engine TimeOut
Timeout in seconds (s) for processing the image. This is not a hard limit. When the timeout is reached, the engine will stop to process further pages or text lines and the action will return false. In this case, try to increase the timeouts as per the number of images.
Timeout value can be set using the variable pl_timeOut
rrSet("40", "@X.pl_timeOut") sets the timeout to 40 sec.
The CCO File
Refer to "Improving Image Quality for Better Recognition" from "Best Practices for optimal text recognition in IBM Datacap" in case any image enhancement is required such as rotate, etc.
All the configuration parameters provided in this action library are case-sensitive.
OCRPL engine processing logs
OCRPL engine processing logs can be generated by setting the variable pl_PathToLog to the appropriate path.
rrSet( "C:\OCRPL_Logs" , "@X.pl_PathToLog")
In this example, after processing, OCRPL engine related logs are saved in the configured locationC:\OCRPL_Logs.