RecognizeToPDFOCR_S

Performs full page recognition and saves the current page as a PDF file. You can also create the file in PDF/A format.

Member of namespace

OCR_SR

Syntax

bool RecognizeToPDFOCR_S(int OutputPDFType)

Parameters

OutputPDFType: Type int

Parameters

A numeric value that indicates the PDF output type

A PDF document with the original image in the foreground with the recognized text hidden in the background (but in the correct position). Perfect for archiving and indexing documents.
A general PDF document where the text in the original image is replaced by the corresponding text that is recognized by the engine.
A special type of PDF document, where the suspect words are covered by their images cut out from the original image.
A non-searchable PDF document.

Returns

False if the rule with this action is not applied to a document or page object or if the parameters are not in the valid range. Otherwise, True.

Level

Document and Page only.

Details

This action converts a scanned Image file (.tif) to an Adobe Portable Document Format (PDF) file.

By default, PDF documents created by this action are compatible with PDF Version 1.6.

However, it is possible to change the default compatibility by setting the s_pdfVersion variable to one of the following values:

2 = PDF Version 1.5 
3 = PDF Version 1.4 
4 = PDF Version 1.3 
5 = PDF Version 1.2 
6 = PDF Version 1.1 
7 = PDF Version 1.0 
8 = PDF Version A 
9 = PDF Version 1.6 
10 = PDF Version 1.7
11 = PDF Version A2B 
12 = PDF Version A2U 
13 = PDF Version A1A 
14 = PDF Version A2A

To exclude specific page types, set the variable typesToExclude to a comma delimited list of page types to exclude from the PDF.

To include specific page types, set the variable typesToInclude to a comma delimited list of page types to include in the PDF.

To exclude specific page status, set the variable statusToExclude to a comma delimited list of page status to exclude from the PDF.

When more than one filter is specified, the following order of precedence takes place:

statusToExclude overrides typesToInclude
typesToInclude overrides typesToExclude

If you are calling the action at the Document level, the types and status filters apply to both the documents and their child pages.

If you are calling the action at the Page level, the types and status filters apply to the page only.

These variables must be set before calling the action RecognizeToPDFOCR_S.

This action does not support converting or combining PDFs into a new PDF.

Automatic Rotation and Deskew

When combining images into a new PDF document, this action automatically attempts to rotate and dekew the images as they are placed into the PDF. The input images do not change. They appear as in the document.

Automatic rotation can be disabled by setting the DCO variable "s_autorotate" to "0".<br/>

Automatic deskew can be disabled by setting the DCO varialbe "s_deskew" to "0".<br/><br/>

If an image contains text with various orientations, for example vertical and horizontal, the image might be rotated undesirably. The automatic image rotation algorithm relies on, and works best with, images with good quality machine printed text. The page should include at least one text line which includes a machine printed text of at least 30 characters. The automatic rotation algorithm does not fully work with images containing 9-pin dot-matrix text or non-machine printed text.

The automatic deskew is effective only on images with a lower than 15-degree skew. If the image is skewed greater than that amount the image may or may not be adjusted.

Automatic Retry

Automatic retry is supported if the operation timeout is reached. The automatic retry time out may need to be adjusted by calling the action SetupAutomaticRetry, or the actions SetOutOfProcessRecogTimeout and EgineTimeoutOCR_S, if you are using the legacy timeout procedure. The intent of the timeout is to prevent a recognition action to indefinitely hang, if something goes wrong in the recognition engine and it never completes.

The default timeout is typically sufficient for single page recognition, however when building multiple pages into a single PDF, this action requires more time than the default value. The amount of time needed is directly dependent on the number of pages being converted to a PDF and dependent on the current machine load.

The required time must be further extended if the batch directory exists on a network directory, which also increases the action's processing time. While it is not possible to provide a specific default that works in every case, it is recommended to set a very large value if you are finding that the action is taking longer than the timeout, stopping the conversion prematurely. The action should eventually complete when all of the pages have been processed.

There may be other timeouts that can affect this action. For example, Rulerunner has timeouts that may prematurely terminate a thread that is taking longer to run because of the large number of pages being processed. These additional time outs may also need to be adjusted to allow the action to complete successfully.

For more information about configuring the automatic retry and its associated time out, see OCR_SR actions

Example 1

rrSet("75","@D.statusToExclude)
rrSet("Blank","@D.typesToExclude)
RecognizeToPDF(3)

This example creates a PDF document with all of the pages that are contained in the DCO Document object except those pages with type "Blank" and status "75".