Fingerprint matching
Fingerprint matching is the method of identifying a page type by using fingerprints. Specifically, a page’s fingerprint is compared to the fingerprints in a database in which each database fingerprint has an associated page type. As a result of fingerprint matching, a page is assigned the page type of the most closely matching database fingerprint.
A fingerprint is a representation of either the relative densities of different regions of the page (an image-based fingerprint) or the location of text on the page (an OCR-based fingerprint). For more information, see the Selecting the fingerprint creation mode section in this topic.
For example, assume that the fingerprint for an incoming page matches the Hotel #1 room receipt fingerprint. Datacap assigns the page type called Room_Receipt. It then records the ID of the matching fingerprint in the runtime batch hierarchy. The match is not exact because the data on the page is most likely different. However, you are just looking for the best match possible.
Selecting the fingerprint creation mode
Datacap provides two primary methods for generating page fingerprints.
- Image analysis
- This method scans the page image to identify the composite blackness of different regions of the page. This method provides fast page identification, but it requires that you do recognition later.
- Full page recognition
- This method does optical character recognition to identify the locations of text within the page. This method takes longer, especially with pages that include handwritten text. However, it reduces the time from subsequent workflow tasks because the full page recognition results are available for use.
Both of these methods write the resulting information to a CCO file that is stored with the original TIFF image file in the fingerprint folder for the application.
Image analysis
Image analysis uses a pixel-based algorithm to generate a CCO fingerprint file that represents the relative blackness of different regions of the page.
The AnalyzeImage action in the Recog_Shared actions library does image analysis on an image file.
| Library | Action | Description |
|---|---|---|
| Recog_Shared | AnalyzeImage | Converts the TIFF image file that represents the current page to a CCO fingerprint file. |
Full page recognition
Full page recognition, as the name suggests, uses the text and location of text on the page to generate the CCO fingerprint file. Datacap includes three optical character recognition (OCR) engines, plus one intelligent character recognition (ICR) engine that you can use to do full page recognition:
- OCR_A
- FineReader OCR engine.
- OCR_S
- Nuance (formerly ScanSoft) OmniPage OCR engine.
- OCR_SR
- Newer implementation of the Nuance OmniPage OCR engine.
- ICR_C
- Open Text RecoStar ICR engine.
Other ICR engines are available as options. As a rule, the OCR engines work well with machine-printed text, whereas the ICR engine works well with hand-printed and machine-printed text.
Datacap includes actions libraries for each recognition engine (OCR_A, OCR_S, OCR_SR, and ICR_C). Each library includes its own version of the full page recognition action.
| Library | Action | Description |
|---|---|---|
| ocr_a | RecognizePageOCR_A | Recognizes all characters on the current page and populates CCO fingerprint file for the page with the recognition results. |
| OCR_s | RecognizePageOCR_S | Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results. |
| ocr_sr | RecognizePageOCR_S | Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results. |
| icr_c | RecognizePageICR_C | Recognizes all characters on the current page and populates the CCO fingerprint file for the page with the recognition results. |
Fingerprint matching actions
Here are some actions that are involved in fingerprint matching:
| Library | Action | Description |
|---|---|---|
| AutoDoc | FindFingerprint | Tries to match the current page fingerprint to a fingerprint in the application fingerprint library. |