IBM Datacap, Version 8.1            

Pattern matching overview

Pattern matching uses anchor objects that you define on the page fingerprints. These anchor objects can be geometric patterns, like a page registration marks or vendor logos, or text-based patterns.

The anchor objects can act both as identification markers used during page identification and as reference points used during registration or realignment of the image.

The Taskmaster pattern matching actions analyze the runtime pages and look for geometric or text-based patterns that match the patterns in the page fingerprints. It is not unlike standard fingerprint identification, except that it uses only a selected region of the fingerprint image. However, you can use the difference between the location of the pattern on the fingerprint and its location on the runtime page to correct registration problems.

Whether you use geometric pattern matching or text-based pattern matching, the basic concept is the same and is illustrated in the following example. Here, the cross hatched region is the anchor object. In the fingerprint, the anchor is located 1.0 inches from the top and left edge of the page. In the scanned page, the anchor pattern is 1.5 inches from the top and 1.4 inches from the left edge of the page.

The fingerprint and Scanned page are shown side by side. The fingerprint anchor is located 1.0 inches from the top and left edge of the page. The scanned image anchor is located 1.5 inches from the top and 1.4 inches from the left edge of the page.

A misalignment of this magnitude almost certainly causes a fingerprint match to fail if you are using one of the standard techniques to match fingerprints. Pattern matching attempts to locate the anchor object and, if successful, computes the offsets that are required to bring the page back into alignment. As such, it can handle much larger registration problems. In the previous example, the image must be moved 0.4 inches to the left and 0.5 inches up, so the required image offset values are -80, -100.

Attention: Taskmaster processes pages at an effective resolution of 200 x 200 pixels per inch, so 0.4 inches is equivalent to 80 pixels and 0.5 inches is equivalent to 100 pixels.

When you are use geometric pattern matching with multiple anchor objects, Taskmaster can do interpolate realignment. The field positions are adjusted based on their proximity to each of the anchor objects.

Although geometric pattern matching and text-based pattern matching are conceptually the same, the implementations are slightly different and use different actions.



Feedback

Last updated: November 2013
dcadg001.htm

© Copyright IBM Corporation 2013.