Using the Frequency/Weight histogram
The Match Designer histogram displays the distribution of the composite weights.
Before you begin
About this task
If you move the Current Data handle on the histogram to a new weight, it automatically scrolls the data grid to the location of the data with the selected weight. Likewise, if you reposition the selection within the data display, the Current Data handle is repositioned in the histogram.
Move
the Current Data handle by either of the following
actions:
- To display records of a certain weight, move the Current
Data handle along the Weight axis. Ascending
by Weight Sort moves the Current Data handle
to the lowest detail weight value.Note: For one-source specifications, the Current Data handle is available only when the data display is in match pair order. To display in match pair order, right-click the data display and click Group by Match Pairs.
- To adjust the Clerical Cutoff or Match Cutoff settings, move the cutoff handle along the Weight axis. The changed cutoffs show in the Cutoff Values pane.
The
following list contains some points to remember about cutoffs:
- The clerical cutoff is a composite weight above which record pairs are considered to be possible matches. Record pairs with weights between the match and the clerical cutoff are known as clericals and typically are reviewed to determine whether they are matches or nonmatches. If you do not want to review clericals, make the match cutoff weight equal to the clerical cutoff weight.
- Cutoff weights can be negative values, if you want. However, when you set cutoff weights to negative values, this setting creates extremely inclusive sets of matched records. The histogram displays the distribution of the composite weights. If you use negative values for cutoff weights, this histogram shows many values at highly negative weights, because most cases are nonmatched pairs. However, record pairs that are obvious disagreements are not a large part of the matching process, and thus, negative weights are not often shown.
- There is another large group of values at highly positive weights for the matched cases. The cutoff values for the match run can be set by inspecting this histogram. Make the clerical review cutoff the weight where the spike in the histogram reaches near the axis. Set the other cutoff weight where the nonmatched cases start to dominate. Experiment and examine the test results as a guide for setting the cutoffs.
- For a two-source many-to-one duplicate match type, there is an additional cutoff weight called a duplicate cutoff. This cutoff is optional. If you use the duplicate cutoff, set it higher than the match cutoff weight. If more than one record pair receives a composite weight that is higher than the match cutoff, these records are declared duplicates if their composite weight is equal to or greater than the duplicate cutoff.
Procedure