DocumentAnalytics actions

Use the DocumentAnalytics actions to identify the content type of text blocks and extract data elements from the page layout file.

The DocumentAnalytics actions take advantage of the document’s layout and font attributes to use analytics to extract data. These actions depend on a page or document level layout XML file such as tm000001._layout.xml. You use the OCR_A or OCR_S Recognize action to create a layout, then call DocumentAnaytics actions to identify the type of each block (AnalyzeLayout) and to extract data elements from the text.

The layout XML file groups text into blocks as a person would looking at the document. Each block can either have the default type of block or a specific type such as title or table. There are Locate actions available to navigate the block structure (for example, GoSiblingBlockNext). This capability is in contrast to the CCO file produced by other actions that groups text into lines that span the width of the page. The layout XML file also retains font and color attributes, saved in CSS format, for the text. These retained attributes are used for extracting data and reconstructing the document in a new format.

Actions that can produce the layout XML include OCR_SR.Recognize and OCR_A.Recognize, both of which can process color images and PDF files. To use the Locate actions and perform click ‘n’ key during verification, use the CreateCcoFromLayout action from the SharedRecognitionTools library to create a CCO file for the page after producing the layout XML file.