DATE2 (comparison function)

DATE2 is intended for single date comparison, such as birthdays or any event that occurs on a specific day (for example, marriage or death).

DATE2 handles Date-Date, Date-Age, and Age-Age comparisons. If both input strings are eight digits, a Date-Date comparison is performed. If either string is a year only, an Age (Age-Age) difference comparison is performed. If one string is a date and the other string is a year age, a Date-Age comparison is performed. This function provides stronger standardization for data from sources that do not perform validation.

DATE2 uses six weight tables (four for date and two for ages).

Number of roles
1
Number of dimensions
(in each role)
1
Optional cmpargs
1) granularity, in percentage (default 5), and
2) maximum index in the age table (default 8)
Weight table
mpi_wgtnval, mpi_wgthead, mpi_wgt1dim

Date-Date comparison:

Date-Date comparisons use four weight tables:

  1. Year – This table has exact match weights for the most common years, a default exact match weight for other years, and a year disagreement weight.
  2. MonthDay – This table is used when both month and day match. It has four-digit month-day entries (for example, 0101, 0324, 0229).
  3. Month – This table is used when months—but not days—agree. It has an exact match weight for all months and a month disagreement weight.
  4. Day – This table is used when days—but not months—agree. It has an exact match for all days of the month (1-31) and a day disagreement weight.

The Year, Month, and Day are compared separately and the final date weight is computed by adding the year, month, and day weight.

For Year weights:

  1. If either year is invalid, the year weight is 0.
  2. If the years are equal, an agreement weight is used for that year in the weight table.
  3. Otherwise, the year disagreement weight is used.

For Month and Day:

  1. If both the month and the day match, the weight is derived from the MonthDay weights. Otherwise, the month and day weight is derived from the Month and Day weight tables.
  2. If both the month and day do not match and a valid date can be formed by inverting the month and date of the first date, the date is inverted and compared to the second date.
  3. The inversion process is repeated for the second date.
  4. If either of inverted total weights is greater than the original month and day weight, this score, less a small penalty, is used. By default the penalty is 50. To adjust the default penalty, within the mpi_wgtnval table add a numval of -3 and the desired penalty.

Age-Age comparison:

The weight tables used are ACIRCA and AGE which are populated in the mpi_wgt1dim table.

The age difference weight table uses the percentage of age difference. For example, if the ages are 20 and 25, the age difference is computed as a percentage of the smallest age.

Two optional cmpargs can be used; 1) granularity, in percentage (default is 5), and 2) maximum index in the age table (default is 8). Granularity is configurable. For example if the granularity is 5, then the weight table has an entry for age differences between 0 and 5 percentages.

As mentioned previously, there are two age difference weight tables—circa age and precise age.

The circa age table is used when one or both of the ages is a circa age. When computed, a function is applied to the matched set age distribution to replicate the difference between real and guessed ages.

The precise age table is used when both ages are non-circa years.

The age comparison weight is computed by taking the difference in the birth years, computing the percent difference, and then looking up the difference in the correct weight table.

Weight tables for DATE2:

  • mpi_wgthead
    1|1|A|CMPID-DOB-ACIRCA|1DIM|CMPID-DOB-ACIRCA|9|0|0|0|
    1|1|A|CMPID-DOB-AGE|1DIM|CMPID-DOB-AGE|9|0|0|0|
    1|1|A|CMPID-DOB-DAY|NVAL|CMPID-DOB-DAY|0|0|0|0|
    1|1|A|CMPID-DOB-MONTH|NVAL|CMPID-DOB-MONTH|0|0|0|0|
    1|1|A|CMPID-DOB-MONTHDAY|NVAL|CMPID-DOB-MONTHDAY|0|0|0|0|
    1|1|A|CMPID-DOB-YEAR|NVAL|CMPID-DOB-YEAR|0|0|0|0|
    
  • mpi_wgt1dim – There are two tables here for DATE2.
    |1|A|CMPID-DOB-ACIRCA|0|0|
    1|1|A|CMPID-DOB-ACIRCA|1|100|
    1|1|A|CMPID-DOB-ACIRCA|2|50|
    1|1|A|CMPID-DOB-ACIRCA|3|25|
    1|1|A|CMPID-DOB-ACIRCA|4|0|
    …..
    1|1|A|CMPID-DOB-AGE|0|0|
    1|1|A|CMPID-DOB-AGE|1|550|
    1|1|A|CMPID-DOB-AGE|2|288|
    1|1|A|CMPID-DOB-AGE|3|163|
    1|1|A|CMPID-DOB-AGE|4|51|
    
  • mpi_wgtnval. – There are three tables for DATE2 here.
    1|1|A|CMPID-DOB-DAY|-2|-70|
    1|1|A|CMPID-DOB-DAY|-1|0|
    1|1|A|CMPID-DOB-DAY|1|136|
    1|1|A|CMPID-DOB-DAY|2|138|
    …
    1|1|A|CMPID-DOB-MONTH|-2|-84|
    1|1|A|CMPID-DOB-MONTH|-1|0|
    1|1|A|CMPID-DOB-MONTH|1|102|
    1|1|A|CMPID-DOB-MONTH|2|105|
    1|1|A|CMPID-DOB-MONTH|3|101|
    1|1|A|CMPID-DOB-MONTH|4|103|
    1|1|A|CMPID-DOB-MONTH|5|101|
    ….
    1|1|A|CMPID-DOB-MONTHDAY|-2|-57|
    1|1|A|CMPID-DOB-MONTHDAY|-1|0|
    1|1|A|CMPID-DOB-MONTHDAY|101|242|
    1|1|A|CMPID-DOB-MONTHDAY|102|259|
    1|1|A|CMPID-DOB-MONTHDAY|103|259|
    1|1|A|CMPID-DOB-MONTHDAY|104|258|
    1|1|A|CMPID-DOB-MONTHDAY|105|260|