Begin Fix Pack 11.4.02 information

Entity resolution rules

High-level integration language programs, called flows, are made up of deterministic and probabilistic rules. Each rule controls a specific step in the entity resolution process. The rules consist of matching and filtering criteria.

Deterministic matching and probabilistic matching

In deterministic rules, both matching and filtering are done deterministically by using a combination of exact equalities and various matching functions (for example, a geo-proximity function or a function that assesses how common the name is according to US Census data). In probabilistic rules, the matching is done with a PME engine that uses standard PME matching algorithms to drive comparison. Filtering of the probabilistic matching results is done deterministically.

Creating a comprehensive set of linkages requires both probabilistic and deterministic matching flows. The two types of flows complement each other.

Deterministic matching flows are based around entity resolution that involves strict comparison between entities and are configured by modifying entity resolution rules. Deterministic matching flows:
  • Allow matching on exact equality of values.
  • Can include non-trivial forms of comparison that involve SystemT libraries and others.
  • Are configured by modifying entity resolution rules.
  • Are responsible for both matching and filtering.
Examples of deterministic matching include:
  • Exact equality of first name or its known variation (Will ↔ William, Jim ↔ Jimmy)
  • Exact equality of location or close proximity
Probabilistic matching flows use the PME Jaql library (PME algorithm) to process non-exact comparison between entities to determine best possible matches. Probabilistic matching flows:
  • Allow non-exact comparison between entities (for example, typographical errors, odd spellings, nicknames).
  • Use the PME Jaql library to run probabilistic matching.
  • Resulting matches are then further filtered and refined by applying entity resolution rules.
  • Configured by modifying the PME algorithm and entity resolution rules.
  • Are responsible for matching, while high-level integration language rules are responsible for refining the matching results through filtering.

The language joins the results of both flows to create linkages between enterprise customer and social profiles. The matching process uses attribute filtering, while the linking and join processes use grouping and cardinality definitions to resolve conflicts. (See the related links to learn more about filtering attributes for better results and about cardinality and its use in conflict resolution policies.)

Image shows the flow of deterministic matching and probabilistic matching results into deterministic filtering, and then combining the final results.

Thresholds are used by both deterministic and probabilistic flows. Changes to thresholds for deterministic rules are made in the code. Any changes to the deterministic thresholds should be done with the guidance of a data analyst. Changes to probabilistic thresholds (auto-link threshold) are made in the PME algorithm through the Big Match Console or MDM Workbench. See the related topics for specific information about adjusting thresholds.



Last updated: 25 Jun 2015
End Fix Pack 11.4.02 information