Recognize
Performs document layout analysis and OCR and also generates a layout XML file such as TM000001_layout.xml.
Member of namespace
OCR_SRSyntax
bool Recognize()
Returns
False if the ruleset with this action is not bound to a page object of the document hierarchy. Otherwise, this action returns True.Level
Page level.Details
Performs document layout analysis and OCR and also generates an XML layout file such as TM000001_layout.xml.
The layout file groups text into blocks similar to how a person would see and identify the structure in the document. (In contrast, other actions produce a CCO file that groups text into lines that span the width of the page.) For example, a page can have items such as tables, paragraphs, and lines, which are all a type of a block. For information about the block types, see DocumentAnalytics actions. A block might be of the default type or of a specific type such as title or table. The type depends on how the recognition engine interprets the block. Locate actions, such as GoSiblingBlockNext, are available in the Locate action library to navigate the block structure.
The layout XML file also retains the font and color attributes for the extraction text in CSS format. This text is used for extracting data and reconstructing the document in a new format.
- Example:
- In the following example, the Recognize action first creates an XML layout
file, and then the CreateCcoFromLayout action creates a CCO file from that layout
file. Both files are created for the current
page.
Recognize() CreateCcoFromLayout()
Supported File Formats
- Extracts embedded text within the PDF document
- Performs recognition only on those areas that contain data but do not contain embedded text
This behavior improves the processing speed and overall performance of PDF document processing.
- Microsoft Excel
- Microsoft Word
- HTML
- RTF
- Txt
For example, you can convert a Microsoft Excel document to PDF by calling the action ExcelWorkbookToPdf. Once the PDF document is created, it can be processed with the Recognize action.