CNT_DIFF comparison
Compares two strings of numbers and assigns agreement or disagreement weights based on the number of differences between the numbers in the strings. Weights are prorated according to the magnitude of the disagreement.
Required Columns
The following data source and reference source columns are required:
- Data. The column that contains the number from the data source.
You can use this comparison with vectors and reverse matching. If you want to create vectors to use in the Match Designer, see Make Vector stage in DataStage.
- Reference. The column that contains the number from the reference source (only applies to a two-source match).
Required Parameter
The following parameter is required:
Param 1. Indicates the number of differences that will be tolerated before the entire disagreement weight is assigned.
Example 1
You can use the CNT_DIFF comparison to count keying errors in columns. Some of these keying errors can include dates, telephone numbers, file or record numbers, and national identity numbers. For example, you have the following birth dates appearing on both files, and you suspect that these numbers represent the same birth date with a data entry error on the sixth number:
19670301
19670801
Example 2
agreement weight - 1/2 (agreement weight + disagreement
weight)
Two or more errors result in the disagreement weight. The disagreement weight is always a negative number. Thus, one error would yield a partial weight.
If you specify 2, the errors are divided into thirds. One error results in assigning the agreement weight minus 1/3 the weight range from agreement to disagreement. Two errors would receive the agreement weight minus 2/3 the weight range, and so on. Thus, the weights are prorated according to the seriousness of the disagreement.