Information Management IBM InfoSphere Master Data Management, Version 10.1

Using the Probabilistic Matching Engine Console

The PME Console is a web-based application used to configure the Probabilistic Matching Engine (PME).

The configuration process involves these steps.
  1. Importing a configuration.

    The first stage of the process involves selecting a configuration to import. A configuration consists of a defined data model and a matching algorithm. The data model must be an IBM® InfoSphere® Master Data Management Sever (MDM Server) data model. The data model must be a version 10.1 IBM Initiate® Workbench configuration. If you are using an older data model, you can upgrade it in IBM Initiate Workbench. All algorithm configurations are managed through IBM Initiate Workbench.

  2. Adding or updating attributes.

    Attributes are the major identifying component of a record and can include name, address, phone number, or other identification numbers.

  3. Adding or updating anonymous values.

    Anonymous values are a way in which the PME filters out invalid input values. For example, a name of "BABY BOY" or phone number of "0123456789".

  4. Exporting your configuration.

    After modifying your attributes, anonymous values, and algorithms, you must export and deploy the configuration archive in the MDM Server. After deployment, load the source data into MDM Server and perform the evergreening process if necessary. Evergreening is the MDM Server process that creates the derived data index. The derived data index must be created before performing optimization in the PME Console. MDM Server includes an Evergreen Console user interface to help perform evergreening tasks. See the MDM Server documentation for details.

  5. Optimizing the configuration.
    The optimization process involves these steps and must be completed before your configuration can be used by the PME.
    • Exporting derived data. Derived data is information that has been configured for matching and scoring. Not all core member or party data goes through the derivation process; only the data that is used by the PME is derived. The process standardizes the data, creates bucket data, creates comparison data, and finally creates the binary files used for weight generation and matching. Derived data does not include attribute tokens that have been identified as anonymous values.
    • Generating weights. A weight is a measure of the evidence that a comparison result provides for a match or non-match between members or parties. The weight generation process assigns weight values to attributes.
    • Creating a sample pairs file for each algorithm type in the configuration (for example, one file for mdmsperson and one for mdmsorg).
  6. Validating your sample pair file.

    After optimizing, you should download the sample pairs file. The sample pairs file shows the member score and raw comparison data. This file can be used to validate the accuracy of your matching algorithm configuration.

  7. Exporting your configuration and redeploying the MDM Server enterprise application.
  8. Running the evergreening process in MDM Server again to identify suspects.


Feedback

Timestamp Last updated: 16 Aug 2012

Topic URL: