Weight generation parameters

The parameters listed in the following tables can be set through the Configuration Editor and should be specified before generating weights. See Jobs and job sets for detailed information about the weight generation job. See Weight generation for a list of weight generation steps.

Table 1. Matched pair threshold
Parameter Value/Description
Database table mpi_cmphead
Used by Compare Members in Bulk (mpxcomp) – during matched pair generation
Description Generates a set of matched pairs that are used to calculate weights throughout the process. The weight generation process runs Compare Members in Bulk (mpxcomp) to find any pairs that score above wgtMAT. All pairs that score below wgtMAT are filtered out.
Scale One implied decimal point; 120 is actually 12.0
Valid range/ Default and Recommended setting 10..200
Default: 100
Recommended setting: 100
Property location Weights tab in the Configuration Editor
Table 2. Attribute matched pair threshold
Parameter Value/Description
Database table mpi_cmphead
Used by Compare Members in Bulk (mpxcomp) – during matched pair generation
Description This is the threshold for the second filter. As weight generation calculates weights for each attribute, it refines the list of matched pairs using all but the current attribute. The pairs must score above wgtABS. If the score is below wgtABS, the pair is filtered out.
Scale One implied decimal point; 120 is actually 12.0
Valid range/ Default and Recommended setting 10..180
Default: 80
Recommended setting: 80
Property location Weights tab in the Configuration Editor
Table 3. Attribute matched pair percentage threshold
Parameter Value/Description
Database table mpi_cmphead
Used by Compare Members in Bulk (mpxcomp) – during matched pair generation
Description After using wgtABS to find matched pairs, this filter is applied. The wgtNRM filter only considers matched pairs where the score is a certain percentage of the exact match score. If this percent falls below wgtNRM, the pair is filtered out.
Scale A percent; a value of 95 is actually 95%
Valid range/ Default and Recommended setting 60..100
Default: 95
Recommended setting: 95
Property location Weights tab in the Configuration Editor
Table 4. Data quality percentage for initial weight estimates
Parameter Value/Description
Database table mpi_cmphead
Used by mpxwgt – during initial weight generation (without matched set)
Description When computing the initial weights using unmatched weights, this parameter defines the matched set error rate, or the percentage of matched attributes that disagree.
Scale A percent; a value of 5 is actually 5%
Valid range/ Default and Recommended setting 1..20
Default = 5
Recommended setting: 5
Property location Weights tab in the Configuration Editor
Table 5. Minimum attribute count
Parameter Value/Description
Database table mpi_cmphead
Used by mpxwgt
Description This property (wgtFLR) defines a lower bound on attribute value frequency counts. When the count is less than the minimum attribute count, it is raised to equal the minimum attribute count.
Scale No scaling; a value of 20 means 20
Valid range/ Default and Recommended setting >0
Default = 5
Recommended setting: 5
Property location Weights tab in the Configuration Editor
Table 6. Convergence threshold
Parameter Value/Description
Database table mpi_cmphead
Used by mpxconv
Description Provides the tolerance for weight generation convergence. The weight generation process performs multiple iterations until weights from the latest run match the weights from the previous run within the points specified in the convergence threshold. For example if a value of 50 is supplied, then the weights converge (wgtCNV) for the two previous iterations when no two weights differ by more than 0.50. Iteration stops when convergence is reached.
Scale Two implied decimal points; a value of 50, is actually 0.50
Valid range/ Default and Recommended setting 1..100
Default = 20
Recommended setting: 20
Property location Weights tab in the Configuration Editor
Table 7. False negative rate
Parameter Value/Description
Database table mpi_cmphead
Used by mpxdist
Description False Negative Rate (wgtFNR). Used by the mpxdist utility to compute the Clerical Review and Auto-link thresholds based on desired false negative rate.
Scale  
Valid range/ Default and Recommended setting 1..100

Recommended setting: 100

Property location Weights tab in the Configuration Editor
Table 8. False positive rate
Parameter Value/Description
Database table mpi_cmphead
Used by mpxdist
Description False Positive Rate (wgtFPR). Used by mpxdist to compute Auto-link thresholds based on desired false positive rate.
Scale  
Valid range/ Default and Recommended setting >0

Recommended setting: 100000

Property location Weights tab in the Configuration Editor
Table 9. Weight table percentage cut-off
Parameter Value/Description
Database table mpi_cmpspec
Used by mpxwgt
Description Defines a cut-off for the weight table. When generating the sval or nval weight table, typically only the most common values are listed. wgtCUT defines the cumulative percentage of the listed values that should be contained in the weight tables. This is available as a property (on the Properties tab) when you select a comparison function.
Scale Percentage; a value of 80 means 80%
Valid range/ Default and Recommended setting 1..100
Default = 80

Recommended setting: 80

Property location Algorithm tab in Properties view when you have selected a comparison function.
Table 10. Minimum weight frequency
Parameter Value/Description
Database table mpi_dvdxcmp
Used by Generate Frequency Stats (mpxfreq) – in wgtmode
Description Defines the minimum frequency of values to be listed in the strfreq table. If minWgtFreq is 10, only values that occur 10 or more times are listed in the table. In general, the larger the attribute population, the larger the minWgtFreq number should be. This is available as a property (on the Properties tab) when you select the connection between a standardization function and a comparison role.
Scale No scaling; a value of 10 means 10
Valid range/ Default and Recommended setting >=0
Default = 20

Recommended setting: 20 for all attributes other than dates.
Dates should be set at 1.

Property location Algorithm tab in Properties view when you have selected the connection between a standardization function and a comparison role.

The weight generation utility uses the binary files mpi_membktd.NNN and mpi_memcmpd.NNN. These files can be generated using a number of utilities, depending on the state of your system.

Table 11. Weight generation utilities used for various system states
Utility System state
Derive Data and Create UNLs (mpxdata) Your member data is not loaded in the database, but exists in a flat load file.
Prepare Binary Files (mpxprep) A) Your member data, including your comparison data (cmpd) and bucket data (bktd), is loaded in the database. B) Your mpi_memcmpd and mpi_membktd tables are current. And C) No changes have been made to the algorithm since mpi_memcmpd and mpi_membktd have been generated.
Derive Data from Server (mpxredvd) Your member data is loaded in the InfoSphere MDM database. Because this utility generates the cmpd and bktd data, it can be used if mpi_memcmpd and mpi_membktd are not populated or are out of date.
Derive Data from UNLs (mpxfsdvd) Your member data exists as .unl files.

The Jobs feature contains options for running any of the utilities listed here or using the membktd.NNN and mpi_memcmpd.NNN files that have already been generated. For more information about these utilities, see Jobs and job sets.