Recognize

Performs document layout analysis and OCR and also generates a layout XML file such as TM000001_layout.xml.

Member of namespace

OCR_SR

Syntax

bool Recognize()

Returns

False if the ruleset with this action is not bound to a page object of the document hierarchy. Otherwise, this action returns True.

Level

Page level.

Details

Performs document layout analysis and OCR and also generates an XML layout file such as TM000001_layout.xml.

The layout file groups text into blocks similar to how a person would see and identify the structure in the document. (In contrast, other actions produce a CCO file that groups text into lines that span the width of the page.) For example, a page can have items such as tables, paragraphs, and lines, which are all a type of a block. For information about the block types, see DocumentAnalytics actions. A block might be of the default type or of a specific type such as title or table. The type depends on how the recognition engine interprets the block. Locate actions, such as GoSiblingBlockNext, are available in the Locate action library to navigate the block structure.

The layout XML file also retains the font and color attributes for the extraction text in CSS format. This text is used for extracting data and reconstructing the document in a new format.

Example:

In the following example, the Recognize action first creates an XML layout file, and then the CreateCcoFromLayout action creates a CCO file from that layout file. Both files are created for the current page.

Recognize() 
CreateCcoFromLayout()

Supported File Formats

This action can process color images and PDF files. It processes PDF documents in the following manner:

Extracts embedded text within the PDF document
Performs recognition only on those areas that contain data but do not contain embedded text

This behavior improves the processing speed and overall performance of PDF document processing.

Tip: You can also process the following types of documents by first converting the documents to searchable PDFs with Convert library actions:

Microsoft Excel
Microsoft Word
HTML
RTF
Txt

For example, you can convert a Microsoft Excel document to PDF by calling the action ExcelWorkbookToPdf. Once the PDF document is created, it can be processed with the Recognize action.

Language Detection

The Recognize action does not support language detection. However, this feature is available in the Recognize action of the OCR_A action library.