Customizing and strengthening your matching algorithm (IBM Match 360 with Watson)
IBM Match 360 with Watson includes tools that you can use to tune and customize your matching algorithm. By tuning your algorithm, you can control the way IBM Match 360 matches your data to create master data entities.
There are two key parts of configuring and tuning your algorithm:
- Selecting your matching attributes. By choosing the data model attributes that are compared during the matching process, you can tell IBM Match 360 with Watson what data points are most important considerations for your algorithm. It's important to choose attributes that are strong differentiators. Unique identifiers such as drivers license numbers are excellent matching attributes.
- Defining your autolink threshold sensitivity. Each organization has different levels of risk tolerance for false matches. The autolink threshold defines how likely it is to combine records into entities.
- If the sensitivity is low, you will have more overall matches, with likely more false positive matches.
- If the sensitivity is high, you will have fewer overall matches and more singleton entities (made up of only a single member record), with likely more false negative non-matches.
The total autolink threshold is calculated by multiplying the autolink sensitivity (0-100) by the maximum possible matching score, which is determined based on the selected match attributes and their maximum weights in the algorithm.
In this topic:
- Preparing to tune your matching algorithm
- Selecting matching attributes
- Defining the autolink sensitivity
- Advanced match algorithm tuning using the IBM Match 360 REST API
Preparing to tune your matching algorithm
If you have not yet run matching on your data, then you must select your matching attributes first before you run matching. You can change your selections later if needed.
You cannot change your autolink threshold sensitivity until after you run matching at least one time. This restriction ensures that you have some basis of comparison for changing your threshold from the default sensitivity. For example, if you notice too many false positive matches in your data, you can increase the sensitivity. If there are too many singleton records, you can decrease the sensitivity.
Selecting matching attributes
To select the attributes that IBM Match 360 uses in the matching algorithm:
- Click the navigation menu and select Matching setup to open the matching setup page.
- Go to the Match settings tab to select the attributes to use in matching data. The first time that you go to this tab, IBM Match 360 automatically generates some suggested attributes from your data model to use in matching.
-
Review the list of matching attributes and their component fields. These attributes and fields will be used as the basis of comparison to match records and create master data entities. To add or remove attributes from the list, click Edit attributes then select or clear attributes and their component fields as needed.
Tip: As you choose your matching attributes, use the Match strength indicator to see an estimate of how your changes affect the matching algorithm.
If you have added any custom attributes to the data model, they are not selected for consideration in matching by default. If you want to use a custom attribute type in matching, you must select it and then specify which of its fields to consider. If you do not specify any fields, then the matching algorithim cannot use the attribute.
For non-custom (predefined) attribute types, if you do not specify which fields to consider, the matching algorithm uses a default set of fields.
-
When you are satisfied with your matching attribute changes, click Save.
- Regenerate your matched entities based on your updated settings. Click the run matching icon
in the action bar.
The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.
Defining the autolink sensitivity
Finding the correct autolink sensitivity for your data set might take some trial and error. Depending on the particular requirements of your organization, you might need to repeat the process of adjusting the sensitivity and re-matching your data more than once.
To change the sensitivity of the matching alogrithm's autolink threshold:
- Click the navigation menu and select Matching setup to open the matching setup page.
- Go to the Match settings tab to select the attributes to use in matching data.
- In the navigation panel, choose Autolink threshold.
- Use the Autolink threshold slider to define your autolink threshold sensitivity, then click Apply.
- Regenerate your matched entities based on your updated settings. Click the run matching icon
in the action bar.
The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.
Advanced match algorithm tuning using the IBM Match 360 REST API
To achieve an advanced level of customization, you can use the IBM Match 360 REST API to configure and tune your matching algorithm.
When working with the API, you must explicitly deploy the algorithm before running your matching jobs. Within the api-model microservice API, the POST /mdm/v1/algorithms/{record_type} method generates a matching algorithm
based on the supplied attributes and fields.
You can further customize the matching algorithm by using the PUT /mdm/v1/algorithms/{record_type} method, which enables you to provide a fully defined matching algorithm in the method's payload.
Here is a sample payload for POST /mdm/v1/algorithms/{record_type} that defines the autolink threshold and a set of matching attributes and fields:
{"person_entity":{"auto_link_threshold":0.4,"matching_attributes":[{"attributes":["legal_name"]},{"attributes":["primary_residence"]}, {"attributes":["mobile_telephone"]},
{"attributes":["birth_date"]}, {"attributes":["gender"]}, {"attributes":["personal_email"]}]}}
For more information about the IBM Match 360 REST API and the corresponding SDKs, including authentication instructions and full documentation of each method, see the IBM Match 360 API reference.
Configuring multi-dimensional comparison filters
Fine tune your matching algorithm even further by defining multi-dimensional comparison filters. Multi-dimensional filters can compare attributes across records and adjust matching scores and weights up or down based on criteria that you define. Multi-dimensional comparison filters can reduce the amount of false positive or false negative matches in your matching results.
To generate a multi-dimensional comparison filter in your matching algorithm, update the matching engine configuration by using REST API commands:
- Access and authenticate to the IBM Match 360 API interface.
-
Specify a
POST /mdm/v1/algorithms/{record_type}payload that defines a filter, as in the following example:{"person_entity":{"auto_link_threshold":0.4,"matching_attributes":[{"attributes":["legal_name"], "post_filter_methods": ["false_positive_filter"]},{"attributes":["primary_residence"], "post_filter_methods": ["false_positive_filter"]}, {"attributes":["mobile_telephone"]}, {"attributes":["birth_date"], "post_filter_methods": ["false_positive_filter"]}, {"attributes":["gender"]}, {"attributes":["personal_email"]}]}}In the sample payload,
false_positive_filteris the name of the custom filter. It applies to each attribute in the payload that includes the filter name.
The sample API payload will generate an algorithm containing a false_positive_filter in which the weights and penalties are the default, which is 0.
Optionally, you can customize the weights and penalties to meet your organization's requirements, and then deploy your updated algorithm using the PUT API.
Switching the edit distance function
The IBM Match 360 matching engine calculates edit distance as one of the internal functions during comparison and matching of various attributes. Edit distance is a measurement of how dissimilar two strings are from each other. It is calculated by counting the number of changes required to transform one string into the other.
You can choose between the standard edit distance function or a specialized one. The standard edit distance is the default configuration to ensure faster performance during matching. For more information about the edit distance, see IBM Match 360 matching algorithms.
To change the active edit distance function, update the matching engine configuration by using REST API commands:
- Access and authenticate to the IBM Match 360 API interface.
-
Retrieve the existing configuration JSON file for the comparison function,
compare_spec_resource:GET /mdm/v1/compare_spec_resources/{resource_name} -
On your local machine, edit the JSON to add the line
"similar_characters_enabled": true(or remove it if you want to switch back to the default edit distance setting). -
Update the IBM Match 360 configuration by uploading your edited JSON:
PUT /mdm/v1/compare_spec_resources/{resource_name}
Next steps
Learn more
- IBM Match 360 with Watson matching algorithms
- API services available in IBM Match 360
- Exploring master data
- Managing master data
- Tutorial: Onboarding and matching data
Parent topic: Configuring master data