Customizing and strengthening your matching algorithm (IBM Match 360 with Watson)

IBM Match 360 with Watson includes tools that data engineer users can use to tune and customize your matching algorithm. By tuning your algorithm, you can control the way IBM Match 360 matches your data to create master data entities.

There are four key parts of configuring and tuning your algorithm:

Selecting your matching attributes. By choosing the data model attributes that are compared during the matching process, you can tell IBM Match 360 with Watson what data points are most important considerations for your algorithm. It's important to choose attributes that are strong differentiators. Unique identifiers such as drivers license numbers are excellent matching attributes. You must select matching attributes before you run matching the first time.
Requesting and completing pair reviews. Request a pair review to generate intelligent tuning recommendations that optimize your matching algorithm's weights and matching thresholds. During a pair review, a data steward compares pairs of records to determine if they are a match, maybe a match, or not a match. The data steward's answers inform the resulting tuning recommendations.
Applying tuning recommendations. After a pair review task is completed, a data engineer can decide whether to apply the tuning recommendations.
Manually defining your autolink threshold sensitivity. The autolink threshold defines how likely it is to combine records into entities.
- If the sensitivity is low, you will have more overall matches, with likely more false positive matches.
- If the sensitivity is high, you will have fewer overall matches and more singleton entities (made up of only a single member record), with likely more false negative non-matches.
  
  If you accept tuning recommendations from pair reviews, the autolink threshold sensitivity is automatically determined. You can always override the autolink threshold manually if necessary.

In this topic:

Preparing to tune your matching algorithm
Selecting matching attributes
Requesting pair reviews and applying tuning recommendations
Manually defining the autolink sensitivity
Advanced match algorithm tuning using the IBM Match 360 REST API

Preparing to tune your matching algorithm

If you have not yet run matching on your data, then you must select your matching attributes first before you run matching. You can change your selections later if needed.

You cannot change your autolink threshold sensitivity or request pair reviews until after you run matching at least one time. This restriction ensures that you have some basis of comparison for changing your threshold from the default sensitivity. For example, if you notice too many false positive matches in your data, you can increase the sensitivity. If there are too many singleton records, you can decrease the sensitivity.

Before modifying your matching algorithm settings, consider creating a new configuration snapshot to save your current settings. Having a snapshot will make it easier to revert to the previous configuration later if you're unhappy with the results of your changes. For information about creating snapshots, see Saving and loading master data configuration settings by using snapshots.

Selecting matching attributes

To select the attributes that IBM Match 360 uses in the matching algorithm:

Click the navigation menu and select Matching setup to open the matching setup page.
Go to the Match settings tab to select the attributes to use in matching data. The first time that you go to this tab, IBM Match 360 automatically generates some suggested attributes from your data model to use in matching.
Review the list of matching attributes and their component fields. These attributes and fields will be used as the basis of comparison to match records and create master data entities. To add or remove attributes from the list, click Edit attributes then select or clear attributes and their component fields as needed.

As you choose your matching attributes, use the Match strength indicator to see an estimate of how your changes affect the matching algorithm.

If you have added any custom attributes to the data model, they are not selected for consideration in matching by default. If you want to use a custom attribute type in matching, you must select it and then specify which of its fields to consider. If you do not specify any fields, then the matching algorithim cannot use the attribute.

For non-custom (predefined) attribute types, if you do not specify which fields to consider, the matching algorithm uses a default set of fields.
When you are satisfied with your matching attribute changes, click Save.
Regenerate your matched entities based on your updated settings. Click the run matching icon in the action bar.

The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.

Requesting pair reviews and applying tuning recommendations

Use pair reviews to tune your matching algorithm. Each organization has different levels of risk tolerance for false matches, and pair reviews can help determine the right match settings for you.

Data engineers can request pair reviews to be completed by a data steward, and then decide whether to accept the resulting tuning recommendations.

To request a pair review:

Click the navigation menu and select Matching setup to open the matching setup page.
Go to the Algorithm tuning tab to access the algorithm tuning tools.
Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
In the Pair review section, click Request pair review.
Choose the number of record pairs that should be reviewed as part of this task. Reviewing more pairs will result in better tuning recommendations. If too few pairs are reviewed, then IBM Match 360 will not be able to generate recommendations.

Note: The actual number of generated pairs might not match the number defined in this step. The number of generated record pairs depends on the available amount of data in the system and other factors.
Click Send request.

IBM Match 360 starts generating the record pairs and creating the pair review task. The Pair review section keeps you notified you of the status of the review (Generating pairs or Review in progress), and also tracks the progress of the current review task.

For information about completing a pair review task as a data steward user, see Completing pair reviews.

To review and apply the tuning recommendations generated by a pair review:

Click the navigation menu and select Matching setup to open the matching setup page.
Go to the Algorithm tuning tab to access the algorithm tuning tools.
Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
In the Pair review section, review the progress of the latest pair review task. You can see the total number of pairs reviewed and the numbers of pairs that were determined to be matches, not matches, or uncertain matches.
In the Algorithm settings section, review the current matching algorithm settings, as well as estimates of the current false positive and false negative rates.

If too few pair reviews have been completed or if matching has not yet been run, the false positive and false negative rates cannot be displayed.
Expand the Recommended settings section.
Review the recommended updates to the matching algorithm settings. The recommendation represents the threshold with the lowest false positive and false negative rates, based on your reviewed pairs.
If you want to use the recommended settings, click Apply recommendation. Applying the recommendation will change the autolink sensitivity and the associated matching weights of each attribute.
Regenerate your matched entities based on your updated settings. Go to the Match results tab, then click the run matching icon in the action bar.

Manually changing the autolink sensitivity

If you don't use pair reviews to generate recommendations, finding the correct autolink sensitivity for your data set might take some trial and error. Depending on the particular requirements of your organization, you might need to repeat the process of adjusting the sensitivity and re-matching your data more than once.

The total autolink threshold is calculated by multiplying the autolink sensitivity (0-100) by the maximum possible matching score, which is determined based on the selected match attributes and their maximum weights in the algorithm.

Tip: If you are uncertain about the right autolink sensitivity for your organization's needs, complete some pair reviews to get intelligent tuning recommendations. For details, see Requesting pair reviews and applying tuning recommendations.

To manually change the sensitivity of the matching alogrithm's autolink threshold:

Click the navigation menu and select Matching setup to open the matching setup page.
Go to the Algorithm tuning tab to access the algorithm tuning tools.
Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
Expand the Autolink threshold section.
Use the slider to define your autolink sensitivity, then click Apply threshold.
Regenerate your matched entities based on your updated settings. Go to the Match results tab, then click the run matching icon in the action bar.

Advanced matching algorithm tuning using the IBM Match 360 REST API

To achieve an advanced level of customization, you can use the IBM Match 360 REST API to configure and tune your matching algorithm.

When working with the API, you must explicitly deploy the algorithm before running your matching jobs. Within the api-model microservice API, the POST /mdm/v1/algorithms/{record_type} method generates a matching algorithm based on the supplied attributes and fields.

You can further customize the matching algorithm by using the PUT /mdm/v1/algorithms/{record_type} method, which enables you to provide a fully defined matching algorithm in the method's payload.

Here is a sample payload for POST /mdm/v1/algorithms/{record_type} that defines the autolink threshold and a set of matching attributes and fields:

{"person_entity":{"auto_link_threshold":0.4,"matching_attributes":[{"attributes":["legal_name"]},{"attributes":["primary_residence"]}, {"attributes":["mobile_telephone"]},
{"attributes":["birth_date"]}, {"attributes":["gender"]}, {"attributes":["personal_email"]}]}}

For more information about the IBM Match 360 REST API and the corresponding SDKs, including authentication instructions and full documentation of each method, see the IBM Match 360 API reference.

Remember: Any time you update the matching algorithm, even through the API, you must run matching aferwards to see the changes reflected in your match results.

Configuring multi-dimensional comparison filters

Fine tune your matching algorithm even further by defining multi-dimensional comparison filters. Multi-dimensional filters can compare attributes across records and adjust matching scores and weights up or down based on criteria that you define. Multi-dimensional comparison filters can reduce the amount of false positive or false negative matches in your matching results.

To generate a multi-dimensional comparison filter in your matching algorithm, update the matching engine configuration by using REST API commands:

Access and authenticate to the IBM Match 360 API interface.

Specify a POST /mdm/v1/algorithms/{record_type} payload that defines a filter, as in the following example:

 {"person_entity":{"auto_link_threshold":0.4,"matching_attributes":[{"attributes":["legal_name"], "post_filter_methods": ["false_positive_filter"]},{"attributes":["primary_residence"], "post_filter_methods": ["false_positive_filter"]}, {"attributes":["mobile_telephone"]},
 {"attributes":["birth_date"], "post_filter_methods": ["false_positive_filter"]}, {"attributes":["gender"]}, {"attributes":["personal_email"]}]}}

In the sample payload, false_positive_filter is the name of the custom filter. It applies to each attribute in the payload that includes the filter name.

The sample API payload will generate an algorithm containing a false_positive_filter in which the weights and penalties are the default, which is 0.

Optionally, you can customize the weights and penalties to meet your organization's requirements, and then deploy your updated algorithm using the PUT API.

Switching the edit distance function

The IBM Match 360 matching engine calculates edit distance as one of the internal functions during comparison and matching of various attributes. Edit distance is a measurement of how dissimilar two strings are from each other. It is calculated by counting the number of changes required to transform one string into the other.

You can choose between the standard edit distance function or a specialized one. The standard edit distance is the default configuration to ensure faster performance during matching. For more information about the edit distance, see IBM Match 360 matching algorithms.

To change the active edit distance function, update the matching engine configuration by using REST API commands:

Access and authenticate to the IBM Match 360 API interface.
Retrieve the existing configuration JSON file for the comparison function, compare_spec_resource:
```
 GET /mdm/v1/compare_spec_resources/{resource_name}
```
On your local machine, edit the JSON to add the line "similar_characters_enabled": true (or remove it if you want to switch back to the default edit distance setting).
Update the IBM Match 360 configuration by uploading your edited JSON:
```
 PUT /mdm/v1/compare_spec_resources/{resource_name}
```

Next steps

Learn more

Parent topic: Configuring master data