Excel actions
Use the Excel actions to convert a Microsoft Excel document file into TIFF image files.
Excel Overview
The Excel conversion will convert electronic XLS and XLSX documents to a TIF file. One possible use of the TIF files is to perform recognition and subsequent rules processing on them. Use the available Excel conversion actions to set the conversion as desired, then use the ExcelWorkbookToImage to convert the images that have been input to the batch.
If you are converting to TIF images so recognition can be performed, it is suggested that the output TIF format be one bit black and white, fax group 4 compressed images. Images of this specification work best for recognition.
Conversion Limits
When using the conversion actions, the maximum number of initial input files in a single batch using the default Alpha-decimal file name pattern is 1296. These files can then expand into more files within the batch, creating a batch with far more than 1296 total documents in the end.
For example, an input batch of a word document containing 6 pages will result in a batch of 7 files, the initial DOC file and then the resulting 6 TIFFs that were generated from the pages. For each file that is to be converted within the batch, there is a maximum of 1296 output pages. For example, a word document consisting of 2000 pages will only have the first 1296 pages converted to TIFF. Likewise, a single ZIP file has a maximum of 1296 files within it and a msg file has a maximum of 1296 attachments. The 1296 limit exists on each input file, not a total limit for all files combined.
When using the convert actions, there is a 3 level limit for embedded files. For example a MSG file can contain a ZIP file which can contain a PDF file. At the end of the conversion, all files will be extracted and all of the pages in the PDF will be converted to a TIFF. An example that isn’t allowed is a MSG file that contains a ZIP file that contains another ZIP file that contains a PDF because. this has a 4 level hierarchy, which is not permitted.
To scan more than 1296 files into a single batch requires using the SetNamePattern action with the parameter '2' which selects the alternate file naming pattern of TMxxxxxx. In this use case all files scanned or expanded from an original scanned file will be assigned the next available TMxxxxxx pattern where xxxxxx is a range from 1 to 999999; allowing up to a total of 999999 files in a single batch after all scanned files have been expanded.
The Rules
Because rules are flexible, there are multiple ways that your applications can use the electronic document conversion actions. The following is one recommended way of using the actions within your Application.
Perform the electron virtual scanning in its own task profile, to create the batch of input documents to be processed. Use the electronic document conversion actions in their own task profile.
In a single ruleset, you can perform all of the actions to convert your electronic documents to TIF files prior to recognition. Create a function for each type of electronic document that you expect to convert with the electronic document conversion actions. For example, one function to operate on ZIP files, another to operate on Word files, etc., for as many different types as you need. If the default values are not adequate, use the actions to configure your output format, then use the action to convert the document to a TIF file for each page within the document. If the page is not of the expected type, the action will return false and proceed to the next type and attempt to convert again. This operation will continue until all of the desired types have been converted. Types that are not expected, will be ignored by this process. You can setup additional rules to handle them, if required by your specific application.
Example Ruleset
The following shows a specific example ruleset and functions configured to process PDF, Word and Excel documents using the convert actions. You can add more functions to process other file formats supported by convert, such as splitting tiff files, zip files, etc.
Execution runs through each of the functions until a function completes successfully. The page status is first checked and then it attempts to convert the page. If the page is converted successfully, the page status is set to 75, meaning it is deleted. The function completes and no further processing is performed on the ruleset. If the document cannot be converted because the type does not match, then control passes to the next function and again attempts to convert the page.
Ruleset Convert Files
- Function Process PDF
- - ChkDCOStatus("49")
- - PDFDocumentToImage()
- - SetDCOStatus("75")
- Function Process Word
- - ChkDCOStatus("49")
- - WordDocumentToImage()
- - SetDCOStatus("75")
- Function Process Excel
- - ChkDCOStatus("49")
- - ExcelWorkbookToImage()
- - SetDCOStatus("75")
Setting the page to a deleted status allows that page to be skipped by subsequent processing. For example, if you use virtual scan create a batch containing a PDF document, then convert the PDF to a series of image files using the convert actions, it is likely that you no longer need to perform any further processing on the input PDF file. An application could skip processing of these pages, as needed, by first calling ChkDCOStatus("49") and the following actions in the function will only be run on pages that have that status.
As a general rule, subsequent processing and recognition is performed on the TIFF files that were created at run time, such as the TIFF created from DOC file, not the parent DOC file. Note that setting the page to deleted does not remove the page reference from the DCO and it does not delete the file from the batch directory. Because the original file still exists, at export time it is possible to include the original document when exporting to an external repository.
Variables Created During Conversion
The conversion actions will store data in variables that may be useful. These are the variables created: "IMAGEFILE" : The name of the TIF file that is associated with the converted page. The value would typically look like this "01010000.tif". This variable is on the page level.
"ParentImage" : The name of the document that was used to create this page. If the TIF was created by the conversion of a WORD file, then the value would typically look like "02000000.doc". This variable is on the page level. If the page has multiple parents, such as when it was extracted from a ZIP file that was inside a ZIP file, then the parents will be separated by a colon. For example, TM000001.zip:01020000.zip.