RecognizeToPDFOCR_A

Converts a scanned Images (.tif) to an Adobe Portable Document Format (PDF) file.

Member of namespace

ocr_a

Syntax

bool RecognizeToPDFOCR_A()

Parameters

None.

Returns

False if called at an invalid level. Otherwise, True.

Level

Document or Page Level.

Details

Converts one or more scanned Images (.tif) to an Adobe Portable Document Format (PDF) file. The PDF is searchable as it also includes the text as read directly by the recognition engine.

When placed at Page level, the action recognizes and converts the current TIF page to a PDF file.

When placed at document level, the action recognizes and converts all TIF pages in the existing doc into one PDF file. If the pages are PDF files, the action builds a new PDF that combines the page level PDFs into a single PDF.

Document Format

To create PDF/A1A documents, set the y_pdfA variable to "1" before you call RecognizeToPDFOCR_A.

To create PDF/A1B documents, set the y_pdfA variable to "1" and the y_pdfA1B variable to "1" before you call RecognizeToPDFOCR_A.

To set the MRC (Mixed Raster Content) Mode for conversion to PDF/A, set the y_pdfMRCMode variable to one of the following values:

0 - Engine decides whether MRC is to be used. Default.
1 - MRC is always used.
2 - MRC is never used. MRC technology uses a lossy compression algorithm. Some unimportant information from the source image (background texture, garbage, and so on) can be lost. Disable MRC if even insignificant information from the source image cannot be lost. Using a parameter of 2 helps to address issues where the text in the PDF document is too dark.

Document Contents

To exclude specific page types, set the variable typesToExclude to a comma-delimited list of page types to exclude from the PDF.

To include specific page types, set the variable typesToInclude to a comma-delimited list of page types to include in the PDF.

To exclude specific page status, set the variable statusToExclude to a comma-delimited list of page status to exclude from the PDF.

When more than one filter is specified, the following order of precedence takes place:

statusToExclude overrides typesToInclude.
typesToInclude overrides typesToExclude.

If you are calling the action at the Document level, the types and status filters apply to both the documents and their child pages.

If you are calling the action at the Page level, the types and status filters apply to the page only.

By default, recognition is performed on images, creating searchable text in the PDF. To prevent recognition, creating an image only PDF, set the variable y_PDFImageOnly to 1 in the current DCO object.

If you are creating a searchable PDF, for more information, on the supported languages and how to configure them, refer the OCR_A action, Recognize.

Document Attributes

The following variables can be used to set the corresponding PDF document attributes:

y_PDFKeys
y_PDFAuthor
y_PDFTitle
y_PDFSubject
y_PDFProducer
y_pdfCreator
y_PDFQuality
y_pdfDelTmp

Memory or Disk Processing

By default the conversion is performed in memory. If you are creating PDF with many pages, it is possible for the conversion to run out of memory. The disk can be used for processing by setting the DCO variable y_maxPagesForInMemoryProcessing to the maximum number of pages for in-memory processing. If the document contains more pages than this value, the disk is used instead of memory.

The variables in the previous three sections must be set before you call the RecognizeToPDFOCR_A action.

Including PDF Annotations: By default, when you convert PDF to PDF, annotations in the source PDF file are not included in the output PDF. "Free Text" annotations in source PDF can be included in the output PDF by setting the page DCO variable y_IncludeAnnotation to "1". Other types of PDF annotations are not supported, such as popup and ink annotations. This feature does not cause the text of a "Sticky note" to be displayed on the image and a sticky note icon might display on the final image regardless of this setting.

PDF Export Optimization

The variable y_pdfExportScenario can be used to set the scenario of export to PDF (PDF/A) format, which optimizes export for some parameters. This impacts the size and quality of the output PDF.

It takes the following values:

0 - Optimize the PDF (PDF/A) export in order to receive the best quality of the resulting file.(This is default)
1 - The PDF (PDF/A) export will be balanced between the quality of the resulting file, its size and the time of processing.
2 - Optimize the PDF (PDF/A) export in order to receive the minimum size of the resulting file.
3 - Optimize the PDF (PDF/A) export in order to receive the highest speed of processing.

Example

rrset("IBM","@D.y_PDFProducer")
rrSet("75","@D.statusToExclude)
rrSet("Blank","@D.typesToExclude)
RecognizeToPDF_A()

This example creates a PDF document with all of the pages that are contained in the DCO Document object except those pages with type "Blank" and status "75".