Fingerprint matching

Fingerprint matching is the method of identifying a page type by using fingerprints. Specifically, a page’s fingerprint is compared to the fingerprints in a database in which each database fingerprint has an associated page type. As a result of fingerprint matching, a page is assigned the page type of the most closely matching database fingerprint.

A fingerprint is a representation of either the relative densities of different regions of the page (an image-based fingerprint) or the location of text on the page (an OCR-based fingerprint). For more information, see the Selecting the fingerprint creation mode section in this topic.

For example, assume that the fingerprint for an incoming page matches the Hotel #1 room receipt fingerprint. Datacap assigns the page type called Room_Receipt. It then records the ID of the matching fingerprint in the runtime batch hierarchy. The match is not exact because the data on the page is most likely different. However, you are just looking for the best match possible.

Selecting the fingerprint creation mode

Datacap provides two primary methods for generating page fingerprints.

Image analysis: This method scans the page image to identify the composite blackness of different regions of the page. This method provides fast page identification, but it requires that you do recognition later.

Full page recognition: This method does optical character recognition to identify the locations of text within the page. This method takes longer, especially with pages that include handwritten text. However, it reduces the time from subsequent workflow tasks because the full page recognition results are available for use.

Both of these methods write the resulting information to a CCO file that is stored with the original TIFF image file in the fingerprint folder for the application.

Remember: The method that you use for creating library fingerprints must be the same as the method that you use to generate runtime fingerprints during page identification.

For example, if you decide to use image analysis, you must use image analysis in both the FingerprintAdd and PageID rulesets.

Important: Do not try to combine these methods because the recognition results are probably not accurate.

Image analysis

Image analysis uses a pixel-based algorithm to generate a CCO fingerprint file that represents the relative blackness of different regions of the page.

The AnalyzeImage action in the Recog_Shared actions library does image analysis on an image file.

Library	Action	Description
Recog_Shared	AnalyzeImage	Converts the TIFF image file that represents the current page to a CCO fingerprint file.

Full page recognition

Full page recognition, as the name suggests, uses the text and location of text on the page to generate the CCO fingerprint file. Datacap includes three optical character recognition (OCR) engines, plus one intelligent character recognition (ICR) engine that you can use to do full page recognition:

OCR_A: FineReader OCR engine.

OCR_S: Nuance (formerly ScanSoft) OmniPage OCR engine.

OCR_SR: Newer implementation of the Nuance OmniPage OCR engine.

ICR_C: Open Text RecoStar ICR engine.

Other ICR engines are available as options. As a rule, the OCR engines work well with machine-printed text, whereas the ICR engine works well with hand-printed and machine-printed text.

Datacap includes actions libraries for each recognition engine (OCR_A, OCR_S, OCR_SR, and ICR_C). Each library includes its own version of the full page recognition action.

Library	Action	Description
ocr_a	RecognizePageOCR_A	Recognizes all characters on the current page and populates CCO fingerprint file for the page with the recognition results.
OCR_s	RecognizePageOCR_S	Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.
ocr_sr	RecognizePageOCR_S	Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.
icr_c	RecognizePageICR_C	Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results.

Fingerprint matching actions

Here are some actions that are involved in fingerprint matching:

Library	Action	Description
AutoDoc	FindFingerprint	Tries to match the current page fingerprint to a fingerprint in the application fingerprint library.