RecognizeToFileOCR_S

Does full page recognition and writes the recognition results to one of several available output file types, such as .doc, .rtf, .html.

Member of namespace

OCR_SR

Syntax

bool RecognizeToFileOCR_S (int FileType)

Parameters

FileType
Type int

Parameters

fileType - The action requires a Numeric parameter from 1-22 to specify a combination of recognition targets and output formats.
Important: Image refers to the image of the bound Page object of the Document Hierarchy. Filename is the string portion of a file's name that precedes its extension.
The output for all of these parameters will produce a file name that is identical to the original file name and will have the extension specified for that parameter.
  1. A PDF document with the original image in the foreground with the recognized text hidden in the background (but in the correct position). Perfect for archiving and indexing documents.
  2. A general PDF document where the text in the original image is replaced by the corresponding text recognized by the engine.
  3. A special type of PDF document, where the suspect words are covered by their images cut out from the original image.
  4. A non-searchable PDF document.
  5. Recognize an HTML image of the bound Page object of the Document Hierarchy. Output .html (HTML 140).
  6. Recognize an image of the bound Page object of the Document Hierarchy in an Excel file. Output .xls (Excel 2000.)
  7. Recognize any image of the bound Page object of the Document Hierarchy in a WordML file. Output .doc (Word ML).
  8. Recognize any image of the bound Page object of the Document Hierarchy in an RTF2000 file. Output .rtf (RTF 2000).
  9. Recognize the image of the bound Page object of the Document Hierarchy in a Text file with an .RTF6 extension. Output .rtf (Rich Text).
  10. Recognize the image of the Page object of the Document Hierarchy in a Text file with an .RTF6 extension. Output .rtf (Rich Text).
  11. Recognize the image of the Page object of the Document Hierarchy in a Text file with an .Text extension. Output .txt (Text).
  12. Recognize the image of the Page object of the Document Hierarchy in a Text file with an Csv extension. Output .txt (CSV - Comma Separated Variable).
  13. Recognize the image of the Page object of the Document Hierarchy in a Text file with a .FortmattedTxt extension. Output .txt (Formatted Text).
  14. Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UText extension. Output .txt (Text).
  15. Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UCSV extension. Output .CSV (Comma Separated Variable). Obsolete (Deprecated)
  16. Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UFormattedText extension. Output .txt (Text).
  17. Recognize the image of the Page object of the Document Hierarchy in a Text file with an .Audio extension. Output aud (Text).
  18. Recognize the image of the Page object of the Document Hierarchy in a Text file with a .WordPad extension. Output .rtf (Rich Text for WordPad).
  19. Recognize the image of the Page object of the Document Hierarchy in a Text file with an .XML extension. Output .xml (XML).

Returns

False if a ruleset with this action is bound to a Field object of the Document Hierarchy, or if the parameter is not numeric. Otherwise, True.

Level

Page or Document.

Details

Performs OCR recognition on the image of a source page, and stores the output of the OCR/S recognition engine in a file. The output file is in one of 22 alternative formats. Because the files are not actually processed in the format you specify, this action is useful primarily for debugging the engine, of if you need raw (unverified) OCR output in that format.

By default, PDF documents created by this action are compatible with PDF Version 1.6.

However, it is possible to change the default compatibility by setting the s_pdfVersion variable to one of the following values:
2 = PDF Version 1.5 
3 = PDF Version 1.4 
4 = PDF Version 1.3 
5 = PDF Version 1.2 
6 = PDF Version 1.1 
7 = PDF Version 1.0 
8 = PDF Version A 
9 = PDF Version 1.6 
10 = PDF Version 1.7
11 = PDF Version A2B 
12 = PDF Version A2U 
13 = PDF Version A1A 
14 = PDF Version A2A 
To exclude specific page types, set the variable typesToExclude to a comma delimited list of page types to exclude from the pdf.

To exclude specific page status, set the variable statusToExclude to a comma delimited list of page status to exclude from the pdf.

These variables must be set before calling the action RecognizeToPDFOCR_S.

This action supports the automatic retry mechanism.

Example
The following example creates a PDF document with all pages contained in the dco document object except for pages with type Blank and status of 75:
rrSet("75","@D.statusToExclude)
rrSet("Blank","@D.typesToExclude)
RecognizeToFileOCR_S(1)