Customizing and strengthening your matching algorithm in IBM Master Data Management

Tune and customize the matching algorithm to control how IBM Master Data Management matches records to create master data entities.

Required permissions
To configure a master data instance, you must be a member of the DataEngineer user group for the IBM Master Data Management service.

There are four key parts of configuring and tuning the matching algorithm:

  • Selecting matching attributes. By choosing the attributes that are compared during the matching process, you can specify which data points are most important for the matching algorithm. It's important to choose attributes that are strong differentiators. Unique identifiers such as driver's license numbers are excellent matching attributes. You must select matching attributes before you run matching the first time. For details, see Selecting matching attributes.

  • Requesting and completing pair reviews. Request a pair review to generate intelligent tuning recommendations that optimize your matching algorithm's weights and matching thresholds. During a pair review, a data steward compares pairs of records to determine if they are a match, maybe a match, or not a match. The data steward's answers inform the resulting tuning recommendations. For details, see Requesting pair reviews.

  • Applying tuning recommendations. After one or more pair review tasks are completed, a data engineer can generate tuning recommendations, review them and their predicted results, and then decide whether to apply them or not. For details, see Applying tuning recommendations.

  • Defining autolink and clerical review thresholds. If you accept tuning recommendations from pair reviews, the autolink and clerical thresholds are automatically determined, but you can always override the thresholds manually if necessary. For details, see Manually changing the autolink and clerical review thresholds.

For information about advanced algorithm tuning procedures that use the IBM Master Data Management REST API, see Advanced matching algorithm tuning.

Understanding matching algorithm thresholds

Each record-to-record matching comparison that IBM Master Data Management completes generates a matching score. This score is a percentage value from 0 to 100, with 0 being a definite non-match and 100 being a definite match. As part of configuring the matching algorithm, a data engineer can define two threshold values:

  • The autolink threshold defines the minimum matching score for the algorithm to make an automatic match decision between any two records.

    • If the autolink threshold is low, you will have more overall matches, with likely more false positive matches.
    • If the autolink threshold is high, you will have fewer overall matches and more singleton entities (made up of only a single member record), with likely more false negative non-matches.
  • The clerical review threshold defines the minimum matching score for a potential match. Scores below the clerical review threshold are considered non-matches. If configured, the system sends scores that fall between the clerical review threshold and the autolink threshold through the potential matches workflow for data steward remediation.

Horizontal scale showing matching score ranges: below clerical threshold equals non-match, between clerical and autolink thresholds equals potential match for review, above autolink threshold equals automatic match
Matching algorithm thresholds

Important: If the clerical range is not enabled in the matching settings, the potential matches workflow cannot generate any tasks. For information about the potential matches workflow, see [Configuring master data workflows](m360-config-workflow.html).

Preparing to tune your matching algorithm

If you have not yet run matching on your data, you must select matching attributes before you run matching. You can change your selections later if needed.

You cannot change the autolink threshold sensitivity or request pair reviews until after you run matching at least one time. This restriction ensures that you have some basis of comparison for changing your threshold from the default sensitivity. For example, if you notice too many false positive matches in your data, you can increase the sensitivity. If there are too many singleton records, you can decrease the sensitivity.

Tip: Before modifying the matching algorithm settings, consider creating a new configuration snapshot to save the current settings. Having a snapshot will make it easier to revert to the previous configuration later if you're unhappy with the results of your changes. For information about creating snapshots, see Saving and loading master data configuration settings by using snapshots.

Selecting matching attributes

To select the attributes that IBM Master Data Management uses in the matching algorithm:

  1. From the Master data navigation menu, click Data types data types icon.

  2. Click Entity types, then select the entity type whose matching algorithm you want to tune and click the Edit icon edit icon.

  3. Go to the Match settings tab and select Attribute selection in the sidebar to select the attributes to use in matching data. The first time you open this tab, IBM Master Data Management automatically generates suggested matching attributes from your data types.

  4. Review the list of matching attributes and their component fields. The matching algorithm uses these attributes and fields to compare records and create master data entities. To add or remove attributes from the list, click Edit attributes then select or clear attributes and their component fields as needed.

    When you select matching attributes, use the Match strength indicator to see an estimate of how your changes affect the matching algorithm.

    If you have added any custom attributes to the data type definitions, they are not selected for consideration in matching by default. If you want to use a custom attribute type in matching, you must select it and then specify which of its fields to consider. If you do not specify any fields, the matching algorithm cannot use the attribute.

    For non-custom (predefined) attribute types, if you do not specify which fields to consider, the matching algorithm uses a default set of fields.

  5. When you are satisfied with the selected matching attributes, click Save.

  6. Regenerate your matched entities based on your updated settings. Click the run matching icon run matching icon in the action bar.

The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.

Requesting pair reviews to train the matching algorithm

Use pair reviews to tune and train the matching algorithm. Data engineers can request pair reviews to be completed by a data steward, generate tuning recommendations based on the pair reviews, and then decide whether to accept the recommendations.

During a pair review task, a data steward reviews pairs of records to determine if they are a match. With each additional pair review, IBM Master Data Management gets more data for its algorithm tuning recommendations. The more pairs that get reviewed, the better the tuning recommendations will be.

Each organization has different levels of risk tolerance for false matches. Pair reviews can help determine the most appropriate match settings.

In addition to pair review results, you can also choose to include real-world actions taken by your data stewards for consideration when IBM Master Data Management generates recommended settings for your algorithm.

To request a pair review:

  1. From the Master data navigation menu, click Data types data types icon.

  2. Click Entity types, then select the entity type whose matching algorithm you want to tune and click the Edit icon edit icon.

  3. Select Algorithm tuning in the sidebar to access the algorithm tuning tools. Scroll down to the Pair analysis section.

  4. In the Pair analysis section, click Request pair review.

  5. Choose the number of record pairs that should be reviewed as part of this task. Reviewing more pairs will result in better tuning recommendations. IBM Master Data Management cannot display false positive and false negative rates until you complete some pair reviews and run matching.

    Note: The actual number of generated pairs might not exactly match the number defined in this step. The number of generated record pairs depends on the available amount of data in the system and other factors.
  6. Select the record sources from which IBM Master Data Management will pull sample record pairs for the review. The record source attribute is included in all records in your master data.

  7. Click Send request.

IBM Master Data Management starts generating the record pairs and creating the pair review task. The Pair analysis section of the page keeps you notified of the status of the review (Pending, In progress, or Completed), and also tracks the progress of the current review task.

For information about completing a pair review task as a data steward user, see Completing pair reviews.

Generating and applying tuning recommendations

You can use the results of pair review tasks to generate tuning recommendations. Optionally, you can also choose to include real-world actions that data steward users have taken to maintain master data in your system, such as manual links, manual unlinks, and potential match remediation tasks.

To manage pair review results and generate tuning recommendations:

  1. From the Master data navigation menu, click Data types data types icon.

  2. Click Entity types, then select the entity type whose matching algorithm you want to tune and click the Edit icon edit icon.

  3. Select Algorithm tuning in the sidebar to access the algorithm tuning tools. Scroll down to the Pair analysis section.

  4. In the Pair analysis section, review the status of the pending, ongoing, or completed pair review tasks in the system.

  5. Select one or more pair review tasks in the table to view information such as the total number of pairs reviewed and the numbers of pairs that were determined to be matches, not matches, or uncertain matches.

    Tip: To delete a pair review task that is no longer needed or not valid, select it in the table and click Delete delete trash can icon.

  6. Generate a new tuning recommendation:

    1. In the table, select one or more pair review tasks for IBM Master Data Management to consider while generating its algorithm tuning recommendations.
    2. If you want the recommendations to consider real-world stewardship decisions, select Consider stewardship decisions when generating recommendations.
    3. Click Start tuning. IBM Master Data Management takes some time to generate its recommendations.
  7. When the recommendations are ready, they open in a panel beside the current settings. You can compare the predicted results of the recommendations against the results of the current settings.

    For additional predictions and statistics about the matching results for the current and recommended settings, scroll down to view the confusion matrix. Use the confusion matrix to visualize and evaluate the accuracy of your matching algorithm's predicted results against the following performance metrics:

    • True positives
    • False negatives
    • Sensitivity
    • False positives
    • True negatives
    • Specificity
    • Precision
    • Negative precision
    • Accuracy

    Refer to the onscreen glossary for definitions of each metric in the confusion matrix.

To apply generated tuning recommendations:

  1. In the Active threshold settings section of the Algorithm tuning page, review the current matching algorithm settings, as well as estimates of the current false positive and false negative rates.

    The system cannot display false positive and false negative rates until you complete a sufficient number of pair reviews and run matching.

  2. In the Recommended settings section, review the recommended updates to the matching algorithm settings. The recommendation represents the threshold with the lowest false positive and false negative rates, based on your reviewed pairs and steward actions.

  3. To use the recommended settings, click Apply settings. Applying the recommendation will change the autolink sensitivity and the associated matching weights of each attribute.

  4. Optional: To save a copy of the recommended settings, click Export recommended settings.

  5. Regenerate your matched entities based on your updated settings. Go to the Match results tab, then click the run matching icon run matching icon in the action bar.

The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.

Next steps

Learn more