RecognizeToALTOOCR_A
Converts a scanned Images (.tif) to an ALTO electronic Document Format (XML) file.
Member of namespace
ocr_aSyntax
bool RecognizeToALTOOCR_A()
Parameters
None.Returns
False If called at an invalid level. Otherwise, True.Level
Document or Page Level.Details
Converts a scanned Images (.tif) to an ALTO Document Format (XML) file.When placed at Page level, the action recognizes and converts the current tif page to an ALTO file.
When placed at Document level, the action recognizes and converts all tif pages in the existing doc into one ALTO file.
Document FormatThe following variables can be used to set the properties of the ALTO xml file.
y_AltoFontFormattingMode
Specifies the character attributes that are saved to the ALTO xml file.
- 0 - The only saved attribute is whether characters are subscript or superscript. This value is the default.
- 1 - The following attributes are saved, whether characters are subscript, superscript, bold, italic, underlined, strikeout. Font size and font name are not saved.
- 2 - All font attributes are saved.
y_AltoWriteNondeskewedCoordinates
Specifies whether character, word, block coordinates that are written into files in ALTO format are defined on an original image. Or are defined on an image that is used for recognition to which different modifications, such as deskewing, were applied. This property is True by default, which means that the coordinates are defined on the original page.
If you set this property to False, export to ALTO is run faster because there is no need to convert the coordinates between the modified image and the original image, which takes a long time. If this property is set to the default True value, the baseline position is not written during export. If it is set to False, the baseline position is written into the resulting ALTO file because ALTO format requires the baseline position be defined by only one number. In the original coordinates, the baseline might not be strictly horizontal or vertical. In this case, it is impossible to define its position by a single number.
Document ContentsTo exclude specific page types, set the variable typesToExclude to a comma delimited list of page types to exclude from the ALTO xml file.
To include specific page types, set the variable typesToInclude to a comma delimited list of page types to include in the ALTO xml file.
To exclude specific page status, set the variable statusToExclude to a comma delimited list of page status to exclude from the ALTO xml file.
- statusToExclude overrides typesToInclude
- typesToInclude overrides typesToExclude
If you are calling the action at the Document level, the types and status filters apply to both the documents and their child pages.
If you are calling the action at the Page level, the types and status filters apply to the page only.
The variables in the previous three sections must be set before you call the RecognizeToALTOOCR_A action.
- Example
rrset("75","@D.statusToExclude") rrSet("Blank","@D.typesToExclude") RecognizeToALTOOCR_A()This example creates an ALTO XML document with all of the pages that are contained in the DCO Document object except those pages with type "Blank" and status "75".