IBM Match 360 with Watson matching algorithms

IBM Match 360 with Watson uses matching algorithms to resolve data records into master data entities. Data engineers can define different matching algorithms for each entity type in their data. The matching algorithms can then analyze the data to evaluate and compare records, and then collect matched records into entities.

There are two common reasons to run matching on your data:

In this topic:

Matching to create more than one type of entity

IBM Match 360 matching algorithms are driven by the entity type of the associated data. You can define more than one entity type for each record type in the data model. For each entity type, configure and tune its corresponding matching algorithm to ensure that IBM Match 360 creates entities that meet your organization's requirements.

A single record can be part of more than one separate entity. If your data model includes more than one entity type, you can run different types of matching across the same data set. For example, consider a data set that includes person records from across your enterprise. If the Person record type includes definitions for a Person entity type and a Household entity type, then you can run the Person matching algorithm for entity resolution and deduplication, and also run the Household matching algorithm to create entities made up of person records that belong to the same household.

The matching process

The matching engine goes through a defined process to match records into entities. The matching process includes three major steps:

  1. Standardization. During this step, the algorithm standardizes the format of the data so that it can be processed by the matching engine.

  2. Bucketing. The algorithm sorts data into various categories or "buckets" so that it can compare like-to-like pieces of information.

  3. Comparison. The algorithm compares data to determine a final comparison score. The algorithm then uses the comparison score to determine whether the records are a match.

Each of these steps is defined and configured by the matching algorithm.

Components of the matching algorithm

Two main types of components define an IBM Match 360 matching algorithm:

Standardizers

As the name suggests, standardizers define how data gets standardized. Standardization enables the matching algorithm to convert the values of different attributes to a standardized representation that can be processed by matching engine.

The matching algorithm uses multiple standardizers. Each standardizer is suited to process specific attribute types found in record data.

Standardizers are defined by JSON objects. Each standardizer's JSON object definition contains three elements:

Entity types

Within a single matching algorithm, each record type can have multiple entity type definitions (entity_type JSON objects). For example, in an algorithm defined for a person record type, you might need to create more than one entity type definition, such as person entity, household entity, location entity, and others.

Each entity type can be used to match and link records in different ways. An entity type defines how records are bucketed and compared during the matching process.

Each entity type definition (entity_type) in the matching algorithm has four JSON elements:

Edit distance

The IBM Match 360 matching engine calculates edit distance as one of the internal functions during comparison and matching of various attributes. Edit distance is a measurement of how dissimilar two strings are from each other. It is calculated by counting the number of changes required to transform one string into the other.

There are different ways to define edit distance by using different sets of string operations. By default, IBM Match 360 uses a standard edit distance function that is publicly available in literature. As an alternative, you can choose to use a specialized IBM Match 360 edit distance function.

Note: Prior to IBM Cloud Pak for Data 4.0 refresh 7 (4.0.7), the specialized edit distance function was the only option for calculating edit distance.

For information about customizing your matching algorithm, including using the API to customize the edit distance, see Customizing and strengthening your matching algorithm.

Learn more

Parent topic: Managing master data