Attribute weight assignment is an important part of the implementation process and comparison accuracy.
- The frequency of occurrence of each attribute value in the data and,
- The frequency that attributes differ when a pair of records represents the same member.
Attribute weight assignment supports accurate and efficient comparison. A comparison weight that is assigned to individual attributes provides a measure of evidence that two or more records represent the same member. The final score that is given when records are compared is the culmination of the individual attribute weight scores.
Using a standard Person member type implementation as an example, weights are computed and assigned for the attributes of surname, given name, gender, personal identification number, address, phone, birth day, birth month, and birth year.
If the algorithm is comparing all of these attributes, why is individual weighting important? Because the frequency of an attribute value has an overall effect on accurate member determination. Some attribute values become more important than others. Using the name attribute for example, “Smith” is considered a fairly common name in the United States. If you randomly compare records that are based on the surname of Smith, it is possible that two records can compare favorably while not being the same person (John Smith versus Janna Smith). However, when you compare two records with the less common name of Zukowski, a match on the surname is a stronger indication that the records might be the same person. In this instance, the surname attribute of Smith receives a lower weight (thus a lower affect on the total comparison score), while the surname of Zukowski gets a higher weight. In some locations, however, the name Zukowski might be more common and analysis of data frequency might result in assigning a lower weight to the name.
Gender and identification numbers (such as a Social Security or national ID number) are two more examples of weight importance. Two records that both have the same gender is a common occurrence and provides little evidence of a match. However, if those records have the same identification number, which is typically unique to an individual, that provides strong evidence of a match. Therefore, gender receives a lower weight than the identification number.