RecognizeToFileOCR_S
Does full page recognition and writes the recognition results to one of several available output file types, such as .doc, .rtf, .html.
Member of namespace
OCR_SRSyntax
bool RecognizeToFileOCR_S (int FileType)
Parameters
- FileType
- Type int
Parameters
fileType - The action requires a Numeric parameter from 1-22 to specify a combination of recognition targets and output formats.Important: Image refers
to the image of the bound Page object of the Document Hierarchy. Filename
is the string portion of a file's name that precedes its extension.
The
output for all of these parameters will produce a file name that is
identical to the original file name and will have the extension specified
for that parameter.
- A PDF document with the original image in the foreground with the recognized text hidden in the background (but in the correct position). Perfect for archiving and indexing documents.
- A general PDF document where the text in the original image is replaced by the corresponding text recognized by the engine.
- A special type of PDF document, where the suspect words are covered by their images cut out from the original image.
- A non-searchable PDF document.
- Recognize an HTML image of the bound Page object of the Document Hierarchy. Output .html (HTML 140).
- Recognize an image of the bound Page object of the Document Hierarchy in an Excel file. Output .xls (Excel 2000.)
- Recognize any image of the bound Page object of the Document Hierarchy in a WordML file. Output .doc (Word ML).
- Recognize any image of the bound Page object of the Document Hierarchy in an RTF2000 file. Output .rtf (RTF 2000).
- Recognize the image of the bound Page object of the Document Hierarchy in a Text file with an .RTF6 extension. Output .rtf (Rich Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with an .RTF6 extension. Output .rtf (Rich Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with an .Text extension. Output .txt (Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with an Csv extension. Output .txt (CSV - Comma Separated Variable).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with a .FortmattedTxt extension. Output .txt (Formatted Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UText extension. Output .txt (Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UCSV extension. Output .CSV (Comma Separated Variable). Obsolete (Deprecated)
- Recognize the image of the Page object of the Document Hierarchy in a Text file with a .UFormattedText extension. Output .txt (Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with an .Audio extension. Output aud (Text).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with a .WordPad extension. Output .rtf (Rich Text for WordPad).
- Recognize the image of the Page object of the Document Hierarchy in a Text file with an .XML extension. Output .xml (XML).
Returns
False if a ruleset with this action is bound to a Field object of the Document Hierarchy, or if the parameter is not numeric. Otherwise, True.Level
Page or Document.Details
Performs OCR recognition on the image of a source page, and stores the output of the OCR/S recognition engine in a file. The output file is in one of 22 alternative formats. Because the files are not actually processed in the format you specify, this action is useful primarily for debugging the engine, of if you need raw (unverified) OCR output in that format.By default, PDF documents created by this action are compatible with PDF Version 1.6.
However, it is possible to change the default compatibility by setting the
s_pdfVersion variable to one of the following
values:
2 = PDF Version 1.5
3 = PDF Version 1.4
4 = PDF Version 1.3
5 = PDF Version 1.2
6 = PDF Version 1.1
7 = PDF Version 1.0
8 = PDF Version A
9 = PDF Version 1.6
10 = PDF Version 1.7
11 = PDF Version A2B
12 = PDF Version A2U
13 = PDF Version A1A
14 = PDF Version A2A
To
exclude specific page types, set the variable typesToExclude to a comma delimited
list of page types to exclude from the pdf.To exclude specific page status, set the variable statusToExclude to a comma delimited list of page status to exclude from the pdf.
These variables must be set before calling the action RecognizeToPDFOCR_S.
This action supports the automatic retry mechanism.
- Example
- The following example creates a PDF document with all pages contained in the dco document object
except for pages with type Blank and status of
75:
rrSet("75","@D.statusToExclude) rrSet("Blank","@D.typesToExclude) RecognizeToFileOCR_S(1)