LR_UNCERT comparison
Compares place information from a data source with geocoding reference files by using a left-right string comparison algorithm based on information theory principles.
Census files and other geographic reference sources contain a left postal code and a right postal code, a left city code and a right city code, and so forth.
Required Columns
The following data source and reference source columns are required.
- Data. The column from the data source.
- Reference. The left column (city code for example) from the reference source.
- Reference. The right column (city code for example) from the reference source.
Required Parameter
The following parameter is required:
- 900. The two strings are identical.
- 850. The two strings can be considered the same.
- 800. The two strings are probably the same.
- 750. The two strings are probably different.
- 700. The two strings are different.
A higher value for the Param 1 parameter causes the match to tolerate fewer differences than it would with a lower value for the Param 1 parameter .
The assigned weight is proportioned linearly between the agreement and disagreement weights. For example, if you specify 700 and the score is 700 or less, then the full disagreement weight is assigned. If the strings agree exactly, the full agreement weight is assigned.
For another example, suppose you specify a value of 850 for the Param 1 parameter, which means that the tolerance is relatively low. A score of 800 would get the full disagreement weight because it is lower than the parameter that you specified. Even though a score of 800 means that the strings are probably the same, you have established a low tolerance.
Required Mode
A mode is required. Choose one of the following modes:
- EITHER. The contents of the data source column must match either of the reference source columns specified (or both) to receive the full agreement weight.
- BASED_PREV. Use the result of a previous D_INT comparison to decide which column to compare.
If you specify the EITHER mode, the data source column must match either of the reference source columns to receive an agreement weight. If you specified the BASED_PREV mode, the data source column must match to the first reference source column of a previous D_INT comparison or of a similar double interval comparison in which the data source matched to the left interval, or the data source column must match to the first reference source column of the previous D_INT in which the data source matched to the right interval. If neither the left nor the right interval agrees, the missing weight for the column is assigned.