Using the pixel threshold evaluation method

The pixel threshold evaluation method uses the RecogOMRThreshold action in the Recog_Shared library.

Specifying the threshold and background levels

The RecogOMRThreshold action takes two parameters:

  • Threshold: Specifies the percentage of black pixels over which the option is considered selected.
  • Background: Used to determine the confidence level and specifies the percentage that can be attributed to the check box outline plus any scanner noise:
    • Any zone with a percentage of black pixels below this value is considered not selected with high confidence. Any zone with a percentage of black pixels between this value and the threshold value is considered not selected with low confidence.
    • Any zone with a percentage of black pixels over (2 * Threshold - Background) is considered selected with high confidence. Any zone with a percentage of black pixels between Threshold and (2 * Threshold - Background) is considered selected with low confidence.

However, if MultiPunch=0 (or is not specified) then only the zone with the highest percentage is selected.

For example, if the threshold value is 20 and the background value is 15 then the high confidence threshold is (2 * 20 - 15) = 25. If you run RecogOMRThreshold (20,15) on an OMR group field with MultiPunch=1, then the following is true.
  • Any zone with more than 25% black pixels is considered selected with high confidence
  • Any zone with between 20% and 25% is considered selected with low confidence
  • Any zone with between 15% and 20% is considered not selected with low confidence
  • Any zone with 15% or less black pixels is considered not selected with high confidence

Determining the appropriate threshold and background values

To determine appropriate values for the threshold and background parameters, you must determine the percentage of pixels within the OMR zone that can be attributed to the check box outline plus any scanner noise. The easiest way to make this determination is to run a page that contains both checked and cleared option boxes though the workflow. Then get the pixel counts from the page data file.

WhenDatacap runs a RecogOMRThreshold action, it counts the number of black pixels within each OMR zone. Datacap then writes the resulting values to the page data file as a density string.
</F>
<F id="Options">
        <V n"Type">Options</V>
        <V n"Position">1171,327,1518,622</V>
        <V n"STATUS">0</V>
        <V n="DensityString">FBG</V>
        <C cn="10" cr="1440,405,1490,418">49</C>
        <C cn="10" cr="1440,475,1490,525">48</C>
        <C cn="10" cr="1440,541,1490,591">49</C>
</F>
The DensityString has one character per OMR zone. In this code example, the Options field has three OMR zones and the DensityString value is FBG. Each of the characters corresponds to a percentage value according to the following formula.
Percentage black pixels = character's ASCII code value minus 48.
In this example:
  • The ASCII code for each of the three characters is 70, 66, and 71 respectively.
  • The percentage of black pixels for each of the three zones is 22% (70-48), 18% (66-48), and 23% (71-48), respectively.
After you obtain the percentage values, you can then refer to the original page image to see if the corresponding check box is selected. This example was obtained from a page where the first and third options were selected, and the second option was not selected.
Checkbox Percentage filled
Check mark 22%
Empty Square 18%
Check mark 23%

Based on these three check boxes, you set the threshold and background values somewhere between 18 and 22. (Fractional values are permitted for the threshold and background parameters.) You can test the values scanning additional pages and checking their density strings before setting final values.

Implications of using RecogOMRThreshold

The RecogOMRThreshold action relies upon pixel counts within the OMR zone. So it is important that all OMR zones have the same dimensions, or as close as possible.

Drawing OMR zones on the Datacap Studio Zones tab can sometimes be difficult. You can establish approximate zone boundaries by drawing the bounding boxes on the Image View tab. Then edit the coordinates in the Pos variables field in the Properties pane.

The coordinates correspond to the upper left corner of the bounding box, such as the x1, y1 coordinate, and the lower right corner (x2, y2). In this example, you enter x1,y1,x2,y2 in the Pos variable field.