NAME_UNCERT comparison
Compares two strings. First, it right-truncates the longer string so that it contains the same number of characters as the shorter string. If that comparison is not an exact match, it evaluates the similarity of the strings by doing an UNCERT comparison. You can use NAME_UNCERT to compare given names, where one of the name strings is shorter than the other.
NAME_UNCERT is a two-part comparison. First, it compares two strings on a character-by-character basis after truncating the longer string so that it contains the same number of characters as the shorter string. Second, if those strings are not an exact match, it evaluates the similarity of the strings by using an algorithm that is based on information theory principles.
Required Columns
The following data source and reference source columns are required:
- Data. The given name from the data source.
You can use this comparison with vectors and reverse matching. If you want to create vectors to use in the Match Designer, see Make Vector stage in DataStage.
- Reference. The given name from the reference source (only applies for a two-source match).
Required Parameter
The following parameter is required:
Param 1. The minimum threshold, which is a number 0 - 900. In other words, a higher value for the Param 1 parameter causes the match to tolerate fewer differences than it would with a lower value for the Param 1 parameter.
- 900. The two strings are identical.
- 850. The two strings can be considered the same.
- 800. The two strings are probably the same.
- 750. The two strings are probably different.
- 700. The two strings are different.
Example 1
The assigned weight is proportioned linearly between the agreement and disagreement weights. For example, if you specify 700 and the score is 700 or less, then the full disagreement weight is assigned. If the strings agree exactly, the full agreement weight is assigned.
Suppose you specify 850 for the MatchParm, which means that the tolerance is relatively low. A score of 800 would get the full disagreement weight because it is lower than the parameter that you specified. Even though a score of 800 means that the strings are probably the same, you require a low tolerance.
Example 2
NAME_UNCERT uses the shorter length of the two names for the comparison and does not compare any characters after that length.
For example, the following two sets of givens names would be considered exact matches:
AL ALBERT
W WILLIAM
This result is different from the CHAR comparison, where these two names would not match. With NAME_UNCERT, length is computed by ignoring trailing blanks (spaces). Embedded blanks are not ignored.
These two sets of names would not match for the UNCERT comparison either. UNCERT factors in variables such as the number of deletions between the strings which would probably result in an assignment of the full disagreement weight.