IBM Datacap, Version 8.1            

Fingerprint matching

With fingerprint matching, Taskmaster generates a fingerprint that describes each incoming page. The fingerprint can include information about the relative densities of different regions of the page or the location of text on the page.

After you generate the fingerprint, Taskmaster compares it to a library of fingerprints for known page types. When it finds a match, it assigns the corresponding page type.

For example, assume that the incoming page matches the Hotel #1 room receipt. Taskmaster assigns the page type called Room_Receipt. It then records the ID of the matching fingerprint in the runtime batch hierarchy. The match is not exact because the data on the page is most likely different. However, you are just looking for the best match possible.

Selecting the fingerprint creation mode

Taskmaster provides two primary methods for generating page fingerprints.

Image analysis
This method scans the page image to identify the composite blackness of different regions of the page. This method provides fast page identification, but it requires that you do recognition later.
Full page recognition
This method does optical character recognition to identify the locations of text within the page. This method takes longer, especially with pages that include handwritten text. However, it reduces the time from subsequent workflow tasks because the full page recognition results are available for use.

Both of these methods write the resulting information to a CCO file that is stored with the original TIFF image file in the fingerprint folder for the application.

Remember: The method that you use for creating library fingerprints must be the same as the method that you use to generate runtime fingerprints during page identification.
For example, if you decide to use image analysis, you must use image analysis in both the FingerprintAdd and PageID rulesets.
Important: Do not try to combine these methods because the recognition results are probably not accurate.

Image analysis

Image analysis uses a pixel-based algorithm to generate a CCO fingerprint file that represents the relative blackness of different regions of the page.

The AnalyzeImage action in the Recog_Shared actions library does image analysis on an image file.

Library Action Description
Recog_Shared AnalyzeImage Converts the TIFF image file that represents the current page to a CCO fingerprint file.

Full page recognition

Full page recognition, as the name suggests, uses the text and location of text on the page to generate the CCO fingerprint file. Taskmaster includes three optical character recognition (OCR) engines, plus one intelligent character recognition (ICR) engine that you can use to do full page recognition:

OCR_A
ABBYY FineReader OCR engine.
OCR_S
Nuance (formerly ScanSoft) OmniPage OCR engine.
OCR_SR
Newer implementation of the Nuance OmniPage OCR engine.
ICR_C
Open Text RecoStar ICR engine.

Other ICR engines are available as options. As a rule, the OCR engines work well with machine-printed text, whereas the ICR engine works well with hand-printed and machine-printed text.

Taskmaster includes actions libraries for each recognition engine (OCR_A, OCR_S, OCR_SR, and ICR_C). Each library includes its own version of the full page recognition action.

Library Action Description
ocr_a RecognizePageOCR_A Recognizes all characters on the current page and populates CCO fingerprint file for the page with the recognition results.
OCR_s RecognizePageOCR_S Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.
ocr_sr RecognizePageOCR_S Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.
icr_c RecognizePageICR_C Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.

Fingerprint matching

The action that is used for all fingerprint matching, regardless of the creation method, is called FindFingerprint.

Library Action Description
AutoDoc FindFingerprint Tries to match the current page fingerprint to a fingerprint in the application fingerprint library.


Feedback

Last updated: November 2013
dcadg262.htm

© Copyright IBM Corporation 2013.