USPS_DINT comparison
Compares an interval from a data source to two intervals from a reference source for columns that contain an address primary number.
This match comparison can be used to compare information from a USPS ZIP+4 file to geographic reference files such as the Census Bureau TIGER file, GDT Dynamap files, or Etak MapBase files. Odd-even parity control information such as the USPS ZIP+4 control column is required.
Frequency information is not taken into account when this match comparison is used but a two-source match requires four input streams. If you use this match comparison with a Two-source Match stage job, create two dummy file inputs instead of files that contain frequency information.
Required Columns
The data source requires an address primary low number, an address primary high number, and an address primary odd/even control. The USPS ZIP code file contains this information. The reference source requires two primary low numbers, two primary high numbers, and two primary odd/even controls, one for each side of the street.
The following data source and reference source columns are required:
- Data. (1) The beginning of the street address range from the data source.
- Data. (2) The ending of the street address range from the data source.
- Reference. (3) The beginning of the street address range for one side of the street (such as from left) from the reference source.
- Reference. (4) The ending of the street address range for one side of the street (such as from left) from the reference source.
- Reference. (5) The beginning of the street address range for the other side of the street (such as from right) from the reference source.
- Reference. (6) The ending of the street address range for the other side of the street (such as to right) from the reference source.
- Data. (Control) The odd/even parity for the range defined with (1) and (2).
- Reference. (Control) The odd/even parity for the range defined with (3) and (4).
- Reference. (Control) The odd/even parity for the range defined with (5) and (6).
The control information from the USPS ZIP+4 code is:
- O. The range represents only odd house numbers.
- E. The range represents only even house numbers.
- B. The range represents all numbers (both odd and even) in the interval.
- U. The parity of the range is unknown.
How It Works
Agreement weight is assigned when:
- The odd/even control is set to E, O, or B on both the data source and the reference source
- The odd/even control is set to E or O on one source and to B on the other source (such as E on the data source and B on the reference source)
Disagreement weight is assigned when the parity is on one source is set to E or O and on the other source is set to the opposite; that is, either the data source to E and the reference source to O or the data source to O and the reference source to E.
If all strings are numeric, the comparison performs an integer interval comparison; otherwise, the comparison performs an alphanumeric interval comparison.
The interval on the data source is first compared to the first interval defined with reference (3) and reference (4). If the odd/even parity agrees, that is, if the data source control matches control (1) or control (2), and the intervals overlap; the intervals are considered a match.
In the table, the data source interval matches the interval on the reference source defined by reference (3) and reference (4) and because the odd/even parity is compatible (odd on the data source and both on the reference source), and the interval 101-199 overlaps with 123-299.
Source | Begin range | End Range | Odd/Even Control |
---|---|---|---|
data interval (1) and (20) | 101 |
199 |
O |
reference interval (3) and (4) | 123 |
299 |
B |
reference interval (5) and (6) | 124 |
298 |
B |
If the interval on the data source does not match the first interval on the reference source, the data source interval is compared with the interval on the reference source defined by reference (5) and reference (6) for a match.