RecognizePageOCR_S

This action performs a full page recognition and populates the fingerprint (CCO) file of the page with the results.

Member of namespace

OCR_SR

Syntax

bool RecognizePageOCR_S ()

Parameters

None.

Returns

False if the ruleset with this action is not bound to a Page object of the Document Hierarchy. Otherwise, True.

Level

Page only.

Details

This action responds to settings in the OCR/S tab in the Zones tab of Datacap Studio to recognize all characters on a page, and populates the page's CCO file with the recognition results.

Automatic retry is supported if the operation timeout is reached.

Attention: The NormalizeCCO action from the CCO2CCO action library should be called after RecognizePageOCR_S if the application is using the navigation and pattern match actions to find recognized text on a page, perform pattern matching or use click-n-key in a verification panel. If a CCO file does not exist when this action is called, the action creates one.

The DCO variable s_charReplace can be used to indicate characters that need to be replaced with an alternate character during the recognition step. This procedure can be used when replacing a specific character in a post-processing step is not feasible. Using this technique provides a way to limit the characters with full-page recognition as can be done with field-level recognition. The replacement string must contain pairs of comma-separated decimal values that represent the characters to replace, such as: original,new,original,new,original,new and so on. It is recommended that this feature is used only if absolutely necessary. It is usually better to adjust characters later in a follow-on step.

The s_charReplace variable is only used during full-page recognition. It is not used for field-level recognition.

Attention: If a CCO file does not exist at the time when this action is called, the action creates one. If a CCO file already exists, it gets replaced.

This action recognizes images only. Other input documents, such as PDF, are not directly supported. To use this action with document types that are not supported, use one of Datacap's conversion actions to change the source document into an image that can be recognized.

For example, use PDFFREDocumentToImage in the Convert action library to create multiple images for each page, then this action can be used to recognize the created pages.

Additionally, images should use a lossless compression like FAX or LZW, and should not use a lossy compression such as JPEG. If JPEG is used at any step in the process, it degrades the crispness of character lines, reducing recognition quality.

Automatic Rotation, Deskew and Border Removal

Automatic rotation, deskew and border removal can be performed automatically at the same time as recognition, if the feature is enabled. Refer OCR_SR actions for more information regarding enabling automatic rotation.

Example

AnalyzeImage()
RotateImage()
RecognizePageOCR_S()
NormalizeCCO("")

This sequence creates a CCO file for the current page, and checks to see whether the rotation of the image is needed. Full-page recognition then takes place in response to settings in the OCR/S tab of the Recognition Options Setup dialog. The recognition results are stored in the CCO file. The words and lines in the CCO are then sorted for use by navigation and pattern match actions.

Example

This example shows how to use the character replacement string.

rrSet("8212,45","@X.s_charReplace") 
RecognizePageOCR_S()

In this case, any dashes that are recognized as a Unicode dash (8213 in decimal) are converted to a standard minus sign character (45 in decimal).

If there are more characters to filter, you should continue with more pairs in the same string.