Record linkage and the matching process

Record linkage is the methodology of identifying records that correspond to the same entity such as a person, household, or product.

In practice, you compare record pairs and classify them into one of these sets: matched pairs and nonmatched pairs.

Statistical methods of record linkage are required, due to these reasons:
  • Columns contain errors or missing values.
  • Data can be unreliable.
  • You want to find the matches with a reasonable statistical assurance.