RecognizeToALTOOCR_A

Converts a scanned Images (.tif) to an ALTO electronic Document Format (XML) file.

Member of namespace

ocr_a

Syntax

bool RecognizeToALTOOCR_A()

Parameters

None.

Returns

False If called at an invalid level. Otherwise, True.

Level

Document or Page Level.

Details

Converts a scanned Images (.tif) to an ALTO Document Format (XML) file.

When placed at Page level, the action recognizes and converts the current tif page to an ALTO file.

When placed at Document level, the action recognizes and converts all tif pages in the existing doc into one ALTO file.

Document Format

The following variables can be used to set the properties of the ALTO xml file.

y_AltoFontFormattingMode

Specifies the character attributes that are saved to the ALTO xml file.

Valid values are.
  • 0 - The only saved attribute is whether characters are subscript or superscript. This value is the default.
  • 1 - The following attributes are saved, whether characters are subscript, superscript, bold, italic, underlined, strikeout. Font size and font name are not saved.
  • 2 - All font attributes are saved.

y_AltoWriteNondeskewedCoordinates

Specifies whether character, word, block coordinates that are written into files in ALTO format are defined on an original image. Or are defined on an image that is used for recognition to which different modifications, such as deskewing, were applied. This property is True by default, which means that the coordinates are defined on the original page.

If you set this property to False, export to ALTO is run faster because there is no need to convert the coordinates between the modified image and the original image, which takes a long time. If this property is set to the default True value, the baseline position is not written during export. If it is set to False, the baseline position is written into the resulting ALTO file because ALTO format requires the baseline position be defined by only one number. In the original coordinates, the baseline might not be strictly horizontal or vertical. In this case, it is impossible to define its position by a single number.

Document Contents

To exclude specific page types, set the variable typesToExclude to a comma delimited list of page types to exclude from the ALTO xml file.

To include specific page types, set the variable typesToInclude to a comma delimited list of page types to include in the ALTO xml file.

To exclude specific page status, set the variable statusToExclude to a comma delimited list of page status to exclude from the ALTO xml file.

When more than one filter is specified, the following order of precedence takes place:
  • statusToExclude overrides typesToInclude
  • typesToInclude overrides typesToExclude

If you are calling the action at the Document level, the types and status filters apply to both the documents and their child pages.

If you are calling the action at the Page level, the types and status filters apply to the page only.

The variables in the previous three sections must be set before you call the RecognizeToALTOOCR_A action.

Example
rrset("75","@D.statusToExclude")
rrSet("Blank","@D.typesToExclude")
RecognizeToALTOOCR_A()

This example creates an ALTO XML document with all of the pages that are contained in the DCO Document object except those pages with type "Blank" and status "75".