Weight generation parameters
The parameters listed in the following tables can be set through the Configuration Editor and should be specified before generating weights. See Jobs and job sets for detailed information about the weight generation job. See Weight generation for a list of weight generation steps.
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | Compare Members in Bulk (mpxcomp) – during matched pair generation |
Description | Generates a set of matched pairs that are used to calculate weights throughout the process. The weight generation process runs Compare Members in Bulk (mpxcomp) to find any pairs that score above wgtMAT. All pairs that score below wgtMAT are filtered out. |
Scale | One implied decimal point; 120 is actually 12.0 |
Valid range/ Default and Recommended setting | 10..200 Default: 100 Recommended setting: 100 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | Compare Members in Bulk (mpxcomp) – during matched pair generation |
Description | This is the threshold for the second filter. As weight generation calculates weights for each attribute, it refines the list of matched pairs using all but the current attribute. The pairs must score above wgtABS. If the score is below wgtABS, the pair is filtered out. |
Scale | One implied decimal point; 120 is actually 12.0 |
Valid range/ Default and Recommended setting | 10..180 Default: 80 Recommended setting: 80 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | Compare Members in Bulk (mpxcomp) – during matched pair generation |
Description | After using wgtABS to find matched pairs, this filter is applied. The wgtNRM filter only considers matched pairs where the score is a certain percentage of the exact match score. If this percent falls below wgtNRM, the pair is filtered out. |
Scale | A percent; a value of 95 is actually 95% |
Valid range/ Default and Recommended setting | 60..100 Default: 95 Recommended setting: 95 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | mpxwgt – during initial weight generation (without matched set) |
Description | When computing the initial weights using unmatched weights, this parameter defines the matched set error rate, or the percentage of matched attributes that disagree. |
Scale | A percent; a value of 5 is actually 5% |
Valid range/ Default and Recommended setting | 1..20 Default = 5 Recommended setting: 5 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | mpxwgt |
Description | This property (wgtFLR) defines a lower bound on attribute value frequency counts. When the count is less than the minimum attribute count, it is raised to equal the minimum attribute count. |
Scale | No scaling; a value of 20 means 20 |
Valid range/ Default and Recommended setting | >0 Default = 5 Recommended setting: 5 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | mpxconv |
Description | Provides the tolerance for weight generation convergence. The weight generation process performs multiple iterations until weights from the latest run match the weights from the previous run within the points specified in the convergence threshold. For example if a value of 50 is supplied, then the weights converge (wgtCNV) for the two previous iterations when no two weights differ by more than 0.50. Iteration stops when convergence is reached. |
Scale | Two implied decimal points; a value of 50, is actually 0.50 |
Valid range/ Default and Recommended setting | 1..100 Default = 20 Recommended setting: 20 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | mpxdist |
Description | False Negative Rate (wgtFNR). Used by the mpxdist utility to compute the Clerical Review and Auto-link thresholds based on desired false negative rate. |
Scale | |
Valid range/ Default and Recommended setting | 1..100 Recommended setting: 100 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmphead |
Used by | mpxdist |
Description | False Positive Rate (wgtFPR). Used by mpxdist to compute Auto-link thresholds based on desired false positive rate. |
Scale | |
Valid range/ Default and Recommended setting | >0 Recommended setting: 100000 |
Property location | Weights tab in the Configuration Editor |
Parameter | Value/Description |
---|---|
Database table | mpi_cmpspec |
Used by | mpxwgt |
Description | Defines a cut-off for the weight table. When generating the sval or nval weight table, typically only the most common values are listed. wgtCUT defines the cumulative percentage of the listed values that should be contained in the weight tables. This is available as a property (on the Properties tab) when you select a comparison function. |
Scale | Percentage; a value of 80 means 80% |
Valid range/ Default and Recommended setting | 1..100 Default = 80 Recommended setting: 80 |
Property location | Algorithm tab in Properties view when you have selected a comparison function. |
Parameter | Value/Description |
---|---|
Database table | mpi_dvdxcmp |
Used by | Generate Frequency Stats (mpxfreq) – in wgtmode |
Description | Defines the minimum frequency of values to be listed in the strfreq table. If minWgtFreq is 10, only values that occur 10 or more times are listed in the table. In general, the larger the attribute population, the larger the minWgtFreq number should be. This is available as a property (on the Properties tab) when you select the connection between a standardization function and a comparison role. |
Scale | No scaling; a value of 10 means 10 |
Valid range/ Default and Recommended setting | >=0 Default = 20 Recommended
setting: 20 for all attributes other than dates. |
Property location | Algorithm tab in Properties view when you have selected the connection between a standardization function and a comparison role. |
The weight generation utility uses the binary files mpi_membktd.NNN and mpi_memcmpd.NNN. These files can be generated using a number of utilities, depending on the state of your system.
Utility | System state |
---|---|
Derive Data and Create UNLs (mpxdata) | Your member data is not loaded in the database, but exists in a flat load file. |
Prepare Binary Files (mpxprep) | A) Your member data, including your comparison data (cmpd) and bucket data (bktd), is loaded in the database. B) Your mpi_memcmpd and mpi_membktd tables are current. And C) No changes have been made to the algorithm since mpi_memcmpd and mpi_membktd have been generated. |
Derive Data from Server (mpxredvd) | Your member data is loaded in the InfoSphere MDM database. Because this utility generates the cmpd and bktd data, it can be used if mpi_memcmpd and mpi_membktd are not populated or are out of date. |
Derive Data from UNLs (mpxfsdvd) | Your member data exists as .unl files. |
The Jobs feature contains options for running any of the utilities listed here or using the membktd.NNN and mpi_memcmpd.NNN files that have already been generated. For more information about these utilities, see Jobs and job sets.