Optimizing your algorithm

The optimization process generates weights, calculates the matching thresholds, and creates sample pairs. Optimization is performed on all data in the MDM database. This option is typically used for Probabilistic Matching Engine (PME) configurations or for configurations that are managed through the Master Data Management configuration editor.

About this task

You can run optimization on a local data file, an MDM database, or a PME database. If you optimize a local data file, the data and format file must conform to the mpxdata input specification by using the pipe character (|) as the delimiter. MDM and PME databases must have derived data that is already stored in the database.

Algorithms that use the False Positive Filter (FPF) are not supported for optimization. If your custom configuration includes the FPF, you must remove the filter through the algorithm editor before optimizing.

Attention: If you are optimizing a Probabilistic Matching Engine (PME) configuration, you must export and deploy the configuration archive in the MDM operational server. After deployment of your PME configuration, load the source data into the MDM database and perform the evergreening process if necessary. Evergreening must take place before you can run the optimization process.

Attention: If a patch release of the MDM software includes changes to any algorithm functions, you must run the optimization process to implement the changes to your matching algorithm.

Procedure

Select Master Data Management > Optimize Algorithm.
On the Project and Data Selection window, select a Project and Optimization type from the lists.
If you selected local data file as your optimization type, provide the following information.
1. Select your Data file. The data file must comply with the mpxdata input specification.
2. Select your Config file. This configuration file must comply with the mpxdata input specification.
If you selected Probabilistic matching engine or Master data service as your optimization type, select the Server where the PME or MDM operational server is running. Click Next.
On the Optimization Customization window, provide information for the following fields or accept the defaults.
- False positive rate - specify the false positive rate that is used to compute the Clerical Review and Auto-Link thresholds.
- False negative rate - specify the false negative rate that is used to compute the Clerical Review and Auto-Link thresholds.
- Minimum distributed sample count - enter the minimum distributed sample count to include in the optimization.
- Minimum matched sample count - enter the minimum matched sample count to include in the optimization.
- Minimum sample pair score - specify the minimum comparison score to use for the sample pairs. The minimum range must be between 0.0 - 99.9 and must be less than or equal to the maximum score.
- Maximum sample pair score - specify the maximum score to use for sample pairs. The maximum range must be between 0.0 and 99.9. The score must be greater than or equal to the minimum score.
- Maximum sample pair count - specify the maximum number of pairs for each one-tenth score range in the final sample pair list.
Click Finish.

What to do next

Log in to the server where your project is located and review your validation samples. After the process is complete, download your sample pair file. The sample pairs file shows the member score and raw comparison data. Use this file to validate the accuracy of your matching algorithm configuration.

If you optimized a PME algorithm, export your configuration to the MDM operational server after optimization.