Txt actions
The Txt conversion will convert electronic TXT documents to a TIF file.
Txt Overview
The Txt conversion will convert electronic TXT documents to a TIF file. One possible use of the TIF files is to perform recognition and subsequent rules processing on them. Use the available Tet conversion actions to set the conversion as desired, then use the TxtToImage to convert the images that have been input to the batch.
If you are converting to TIF images so recognition can be performed, it is suggested that the output TIF format be one bit black and white, fax group 4 compressed images. Images of this specification work best for recognition.
Conversion Limits
When using the conversion actions, the maximum number of initial input files in a single batch using the default Alpha-decimal file name pattern is 1296. These files can then expand into more files within the batch, creating a batch with far more than 1296 total documents in the end.
For example, an input batch of a word document containing 6 pages will result in a batch of 7 files, the initial DOC file and then the resulting 6 TIFFs that were generated from the pages. For each file that is to be converted within the batch, there is a maximum of 1296 output pages. For example, a word document consisting of 2000 pages will only have the first 1296 pages converted to TIFF. Likewise, a single ZIP file has a maximum of 1296 files within it and a msg file has a maximum of 1296 attachments. The 1296 limit exists on each input file, not a total limit for all files combined.
When using the convert actions, there is a 3 level limit for embedded files. For example a MSG file can contain a ZIP file which can contain a PDF file. At the end of the conversion, all files will be extracted and all of the pages in the PDF will be converted to a TIFF. An example that is not allowed is a MSG file that contains a ZIP file that contains another ZIP file that contains a PDF because. this has a 4 level hierarchy, which is not permitted.
To scan more than 1296 files into a single batch requires using the SetNamePattern action with the parameter '2' which selects the alternate file naming pattern of TMxxxxxx. In this use case all files scanned or expanded from an original scanned file will be assigned the next available TMxxxxxx pattern where xxxxxx is a range from 1 to 999999; allowing up to a total of 999999 files in a single batch after all scanned files have been expanded.
The Rules
Because rules are flexible, there are multiple ways that your applications can use the electronic document conversion actions. The following is one recommended way of using the actions within your Application.
Perform the electron virtual scanning in its own task profile, to create the batch of input documents to be processed. Use the electronic document conversion actions in their own task profile.
In a single ruleset, you can perform all of the actions to convert your electronic documents to TIF files prior to recognition. Create a function for each type of electronic document that you expect to convert with the electronic document conversion actions. For example, one function to operate on ZIP files, another to operate on Word files, etc., for as many different types as you need. If the default values are not adequate, use the actions to configure your output format, then use the action to convert the document to a TIF file for each page within the document. If the page is not of the expected type, the action will return false and proceed to the next type and attempt to convert again. This operation will continue until all of the desired types have been converted. Types that are not expected, will be ignored by this process. You can setup additional rules to handle them, if required by your specific application.
Variables Created During Conversion
The conversion actions will store data in variables that may be useful. These are the variables created: "IMAGEFILE" : The name of the TIF file that is associated with the converted page. The value would typically look like this "01010000.tif". This variable is on the page level.
"ParentImage" : The name of the document that was used to create this page. If the TIF was created by the conversion of a WORD file, then the value would typically look like "02000000.doc". This variable is on the page level. If the page has multiple parents, such as when it was extracted from a ZIP file that was inside a ZIP file, then the parents will be separated by a colon. For example, TM000001.zip:01020000.zip.