IBM InfoSphere Master Data Management Suspect Duplicate Processing Customization

A step by step procedure to implement Suspect Duplicate Processing customization on IBM InfoSphere Master Data Management V10.0

Suspect Duplicate Processing (SDP) is a key feature of IBM® InfoSphere® Master Data Management (MDM). It allows collapsing suspect parties when found to be identical. It also identifies possible suspect matches and provides a matching score, allowing the user to review the results and collapse the parties if necessary. The out-of-the-box processing is based on the attributes of the party (person/organization) entity. This approach does not always fit with customer's business needs, since matching criteria are very specific. Learn how to customize SDP for the exacting requirements of your customer.

Alice Guidotti (Alice.Guidotti@it.ibm.com), Solution Designer, IBM

Alice Guidotti is a solution designer and developer working for the Rome Smart Solutions Lab (RSSL). She joined IBM at the Rome Tivoli Lab in late 2001, working as a developer and Level 3 engineer in several IBM Tivoli products, mainly in the software license management and monitoring areas. She has been Level 3 technical leader for the IBM Tivoli Monitoring OS Agents. In 2010, she moved to RSSL, where she gained experience in different technology contexts, such as IBM Tivoli Maximo and IBM InfoSphere MDM, delivering successful solutions to several Italian customers.



Gaetano Ruggiero (g.ruggiero@it.ibm.com), IT Architect, IBM

Gaetano Ruggiero is an IBM Certified IT Architect working for the Software Group Lab Services Team in Italy. He joined IBM in 2000, starting his career as a software engineer in the Rome Tivoli Lab. His working expertise spreads from code development to architectural design in several applicative contexts, such as security, information management, and pervasive devices. He has been leader of several development teams as part of the Rome Smart Solutions Lab, which he joined in 2004. He is currently a SME for Information Management products, such as MDM Server, MDM for PIM, and Initiate MDM, and delivered product customizations for several Italian customers. His work was recognized twice with the IBM Client Value Award.



09 August 2012

Introduction

IBM InfoSphere Master Data Management (MDM) Server provides out-of-the-box Suspect Duplicate Processing (SDP) functionality. When adding or updating a party (either a person or an organization), the SDP engine looks for suspect duplicates and, based on matching criteria, decides whether to collapse or just mark them. With this functionality, matching criteria and collapsing policies are bound to predefined rules, and the merging party's attributes are fixed.

Since MDM was designed to implement additions and extensions to its data model, it is common for a customer to require SDP based on other criteria than those provided with the shelf product. Customization includes analyzing matching criteria requirements and choosing a matching matrix; creating database scripts to configure the matching matrix; identifying the suspect search; matching, collapsing, and survivorship policies; and and overriding the external rules.

Leveraging the authors' experience on real customer scenarios, this article will show a step-by-step procedure for implementing an SDP model based on customized rules.

This article will enable:

  1. Familiarity with analysis of matching criteria requirements
  2. Ability to create a corresponding matching matrix, identifying matching scores assigned to each combination of attributes
  3. Ability to set up database for SDP
  4. Familiarity with MDM external rules to override for customized processing (suspect search, matching, collapsing, and survivorship policies)
  5. Ability to override external rules

Prerequisites

You should:

  • Have a basic understanding of IBM InfoSphere Master Data Management Server
  • Be familiar with the IBM InfoSphere Master Data Management data model
  • Be familiar with the Java™ programming language and database management systems
  • Have a basic understanding of web service architectures

System requirements

An environment with MDM Workbench 10.0 is required. Please refer to the IBM InfoSphere MDM 10.0 Information Center for installation system requirements.


Use case

As an example, let's consider a data model where InfoSphere Master Data Management is used to perform master data management of library customers. Suppose the business analysis has determined that the person entity must be extended with the attributes badgeNumber and officeId, as shown in Figure 1.

Figure 1. Extended data model
Image shows MDM data model extended to support master data management of library customers

Let's assume you need to collapse persons based on name, badge number, and office ID. A combination of attributes is used for the name: GivenNameOne and LastName (later referred to as DisplayName); BadgeNumber corresponds to the badge number; and officeId represents the office ID. You can be confident that two suspects are the same if all attributes match. If only a subset of the attributes match, you can be reasonably sure that the two suspects are the same or not sure, based on the weight of the matching attributes. If only one or a few low-priority attributes match, chances are that the two suspects are not the same.

As stated in the InfoSphere Master Data Management Server Data Stewardship UI Users Guide, the Suspect Type Categories are:

  • A1 — Confident that two parties are the same
  • A2 — Reasonably sure two parties are the same
  • B — Not sure if two parties are the same
  • C — Confident two parties are not the same

Preparing the development environment

Start from a default MDM Workbench IDE, and execute the following steps:

  1. Click File > New > Hub Module Project
    Figure 2. Creating the new Hub Module project
    Image shows new hub module project
  2. Specify LibraryHub as the project name and its properties:
    • Base Java Package Name: com.it.ibm.mdm.workshop
    • Service Namespace URI: http://workshop.mdm.ibm.it.com
    • EAR Project Name: MDM
    • Hub Base Name: LibraryHub
    • Database Schema Name: DB2ADMIN
    Figure 3. Setting properties of the new hub module project
    Image shows properties of the LibraryHub project
  3. Click Finish and complete the project module creation.
  4. Replace the file LibraryHub/module.mdmxmi with the one provided in the Downloads section. The module.mdmxmi file is now defining all data model extensions and additions shown in the extended data model.
    Figure 4. Extensions and additions of the new hub module project
    Image shows LibraryHub extensions and additions
  5. Click Generate Implementation and wait for the code generation to complete.
    Figure 5. Code generation
    Image shows code generation

    When code generation completes, the workspace will also include the LibraryHubEJB project.

  6. Descend the LibraryHub > resources > db2 folder.
    Figure 6. The LibraryHubEJB project
    Image shows LibraryHubEJB project
  7. Open a DB2 command-line console, perform the connection to the database, and launch the SQL files in Figure 6, with the syntax shown below.
    Listing 1. SQL commands
    db2 –tvf LibraryHub_SETUP_DB2.sql
    db2 –tvf LibraryHub_CONSTRAINTS_DB2.sql
    db2 –v –td@ -f LibraryHub_TRIGGERS_DB2.sql
    db2 –tvf LibraryHub_MetaData_DB2.sql
    db2 –tvf LibraryHub_CODETABLES_DB2.sql
    db2 –tvf CONFIG_XMLSERVICES_RESPONSE_DB2.sql
  8. Clean and rebuild the entire workspace.
  9. Right-click on the MDM.ear project and select J2EE > Prepare for Deployment.
  10. Right-click on the application server and publish the workspace changes.

Defining the Badge Number equivalence key

Each customer will be uniquely identified within the database through the Badge Number key, stored in the ContEquiv table accordingly with the constraint of uniqueness. This key typology must be added to the CdAdminSysTp code table through the Business Admin console:

  1. Open the Business Admin console with the URL http://localhost:9080/CustomerBusinessAdminWeb/.
  2. From the left navigation menu, select Code Tables > Code Tables.
  3. In the code table name combo box, select CdAdminSysTp and click Go.
    Figure 7. CdAdminSysTp code table
    Image shows CDADMINSYSTP code table
  4. A list of defined equivalence keys typologies will be prompted.
    Figure 8. Equivalence keys typologies
    Image shows equivalence keys typologies
  5. Click Add and provide the requested information for the BadgeNumber key typology creation:
    • Code Type: 100
    • Name: BadgeNumber
    Figure 9. Properties of the entry BadgeNumber in the CdAdminSysTp codetable
    Image shows properties of the BadgeNumber
  6. Click Submit and check in the next table that the BadgeNumber equivalence key typology inserted properly.
    Figure 10. BadgeNumber equivalence key
    Image shows BadgeNumber equivalence key

Analyzing matching criteria requirements

Defining matching scores

Non-matching scores

In this example, the combination of attributes that provide positive matches are examined. A more complex scenario could consider attributes having a negative weight. These attributes contribute to a non-matching score, which can be considered when defining Suspect Type Categories. These kinds of scenarios can be implemented with IBM MDM SDP, although they are not considered here.

Your first task will be working with your customer to understand which set of attributes leads to a guaranteed match. For instance, in our example, the matching of GivenNameOne, LastName, BadgeNumber, and officeId leads to 100-percent confidence that the parties are the same. You also need to understand which subsets of matching attributes will fit the other categories and validate your findings with the customer. For instance, we can consider that if GivenNameOne, LastName, and BadgeNumber match, there is a high probability of the two parties being a duplicate. You can start by providing an importance weight to each attribute to determine a matching score for each combination of matching attributes. Choose the attributes' weights so the scores determined by different combinations are always different (for instance, by choosing weights with different orders of magnitude). In our example, we chose 5, 10, and 100 and obtained a table similar Table 1.

Table 1. Matching scores
AttributesWeightCombinations
GivenNameOne, LastName100xxxx
BadgeNumber10xxxx
OfficeId5xxxx
Matching Score11511010510015105N/A

Finalizing the matching matrix

Once you determine the matching scores for each combination of attributes, you need to decide the corresponding suspect type categories. In our example, we will determine the suspect type categories according to the following thresholds.

  • A1: A definite duplicate party has been found. Score > 110.
  • A2: It is reasonably likely that two parties are the same. 100 < score <= 110.
  • B: It is fairly unlikely that the two parties are the same. 10 <= score <= 100.
  • C: It is definite that two parties are not the same. 0 < score < 10.
  • N/A: No suspect duplicate is found. Score = 0.

By merging the information, we obtain the following Matching Matrix, which shows the matching type categories for each combination of matching attributes, as shown in Table 2.

Table 2. Matching matrix
AttributesWeightCombinations
GivenNameOne, LastName100xxxx
BadgeNumber10xxxx
OfficeId5xxxx
Matching Score11511010510015105N/A
Matching type categoriesA1A2A2BBBCN/A

Setting up the database for SDP

To enable SDP, update the CONFIGELEMENT table, as shown below.

Listing 2. CONFIGELEMENT table update
UPDATE DB2ADMIN.CONFIGELEMENT 
    SET VALUE='true',LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='unknown'
    WHERE NAME = '/IBM/Party/SuspectProcessing/enabled';
                   
UPDATE DB2ADMIN.CONFIGELEMENT 
    SET VALUE='false',LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='unknown'
    WHERE NAME = '/IBM/Party/SuspectProcessing/AddParty/returnSuspect';
                   
UPDATE DB2ADMIN.CONFIGELEMENT 
    SET VALUE='true',LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='unknown'
    WHERE NAME = '/IBM/Party/SuspectProcessing/PersistDuplicateParties/enabled';

You can use the Excel file (see Downloads) in order to determine the SQL statements required to set up SDP corresponding to the matching matrix you defined. The first sheet must be used to define matching scores and corresponding non-matching scores, respectively, for the CDMATCHRELEVTP and CDSUSPECTREASONTP codetables, as shown in Figure 11.

Figure 11. Matching scores in the Excel file
Image shows matching scores in the Excel file

The next sheet is automatically generated and includes the Add Action Types to be used accordingly with the previously defined scores. The setup and rollback sheets must be used to extract the insert statements to be launched across the database in order to configure or un-configure the metadata for SDP. For our purposes, the configuration DDLs are displayed below.

Listing 3. CDMATCHRELEVTP inserts
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3000, 'no match', 'no match', 3000, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3005, 'Office Id', 'match on Office Id', 3005, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3010, 'Badge Number', 'match on Badge Number', 3010, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3015, 'Office id, Badge Number', 'match on Office id and Badge Number', 
3015, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3100, 'Display Name', 'match on First Name and Last Name (Display Name)', 
3100, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3105, 'Display Name, Office Id', 'match on Display Name and Office Id', 
3105, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3110, 'Display Name, Badge Number', 'match on Display Name and Office Id', 
3110, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDMATCHRELEVTP ( lang_tp_cd, MATCH_RELEV_TP_CD, name, description, 
RELEVENCY_SCORE, last_update_dt ) 
VALUES ( 100, 3115, 'Display Name, Office Id, Badge Number', 
'match on Display Name, Office Id and Badge Number', 3115, CURRENT TIMESTAMP );
Listing 4. CMSUSPECTREASONTP inserts
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3000, 'match', 'all elements match', 3000, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3005, 'OfficeId', 'no match on Office Id', 3005, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3010, 'BadgeNumber', 'no match on Badge Number', 3010, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3015, 'OfficeId,BadgeNumber', 'no match on Office id and Badge Number', 
3015, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3100, 'DisplayName', 'no match on First Name and Last Name (Display Name)', 
3100, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3105, 'DisplayName,OfficeId', 'no match on Display Name and Office Id', 
3105, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3110, 'DisplayName,BadgeNumber', 'no match on Display Name and Office Id', 
3110, CURRENT TIMESTAMP ); 
                    
INSERT INTO CDSUSPECTREASONTP ( lang_tp_cd, SUSP_REASON_TP_CD, name, description, 
REASON_SCORE, last_update_dt ) 
VALUES ( 100, 3115, 'DisplayName,OfficeId,BadgeNumber', 'no match on Display Name, 
Office Id and Badge Number', 3115, CURRENT TIMESTAMP );
Listing 5. ADDACTIONTYPE inserts
INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000000, 1, 3005, 3000, 'O', 'C', 4, CURRENT TIMESTAMP ); 
                    
INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000001, 1, 3005, 3005, 'O', 'C', 4, CURRENT TIMESTAMP ); 
                    
INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000002, 1, 3005, 3010, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000003, 1, 3005, 3015, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000004, 1, 3005, 3100, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000005, 1, 3005, 3105, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000006, 1, 3005, 3110, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000007, 1, 3005, 3115, 'O', 'C', 4, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000008, 1, 3010, 3000, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000009, 1, 3010, 3005, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000010, 1, 3010, 3010, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000011, 1, 3010, 3015, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000012, 1, 3010, 3100, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000013, 1, 3010, 3105, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000014, 1, 3010, 3110, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000015, 1, 3010, 3115, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000016, 1, 3015, 3000, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000017, 1, 3015, 3005, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000018, 1, 3015, 3010, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000019, 1, 3015, 3015, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000020, 1, 3015, 3100, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000021, 1, 3015, 3105, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000022, 1, 3015, 3110, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000023, 1, 3015, 3115, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000024, 1, 3100, 3000, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000025, 1, 3100, 3005, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000026, 1, 3100, 3010, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000027, 1, 3100, 3015, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt )
VALUES ( 20000000028, 1, 3100, 3100, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000029, 1, 3100, 3105, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000030, 1, 3100, 3110, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000031, 1, 3100, 3115, 'O', 'B', 3, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000032, 1, 3105, 3000, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt )
VALUES ( 20000000033, 1, 3105, 3005, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000034, 1, 3105, 3010, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000035, 1, 3105, 3015, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000036, 1, 3105, 3100, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000037, 1, 3105, 3105, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000038, 1, 3105, 3110, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000039, 1, 3105, 3115, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000040, 1, 3110, 3000, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000041, 1, 3110, 3005, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000042, 1, 3110, 3010, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000043, 1, 3110, 3015, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000044, 1, 3110, 3100, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000045, 1, 3110, 3105, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000046, 1, 3110, 3110, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000047, 1, 3110, 3115, 'O', 'A2', 2, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000048, 1, 3115, 3000, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000049, 1, 3115, 3005, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000050, 1, 3115, 3010, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000051, 1, 3115, 3015, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000052, 1, 3115, 3100, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000053, 1, 3115, 3105, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt ) 
VALUES ( 20000000054, 1, 3115, 3110, 'O', 'A1', 1, CURRENT TIMESTAMP ); 

INSERT INTO ADDACTIONTYPE (ADD_ACTION_ID, ORG_TP_CD, MATCH_RELEV_TP_CD, 
SUSP_REASON_TP_CD, PERSON_ORG_CODE, ADD_ACTION_CODE, SUSPECT_TP_CD, last_update_dt )
VALUES ( 20000000055, 1, 3115, 3115, 'O', 'A1', 1, CURRENT TIMESTAMP );

Overriding the external rules

To finalize the SDP customization and obtain the desired behavior, you need to override the default external rules that manage the suspect search, the score match evaluation, and the update and survivorship policies. Figure 12 shows the overall SDP activity diagram. When a party is updated (addParty), the SDP engine searches for suspect candidates. This behavior is achieved through the PartySearch external rule, which can be overridden with customized code to alter the default behavior. PartySearch returns a set of possible matches, stored in the MDM database, with the party being added. Each candidate needs to be evaluated to obtain a matching score (those in Table 1). This is obtained by overriding the PartyMatch external rule. If a candidate obtains a matching score corresponding to an A1 category, additional processing is made through the PartyUpdate external rule, which decides how attributes need to be merged together.

Figure 12. The SDP activity diagram
Image shows overall SDP activity diagram

To override the standard Java implementation for the external rules, you have to create the subclasses of the standard MDM implementations:

  1. Double-click on LibraryHub > ejbModule/META-INF/MANIFEST.MF, wait for the manifest editor to open, then check DefaultExternalRules.jar and DefaultExternalRules_NL.jar in the dependencies pane.
    Figure 13. Dependencies pane with DefaultExternalRules JARs
    Image shows dependencies pane with DefaultExternalRules JARs
  2. In the dependencies pane opened above, also select CommonExternalRules.jar and ComonExternalRules_NL.jar.
    Figure 14. Dependencies pane with ComonExternalRules JARs
    Image shows dependencies pane with ComonExternalRules JARs
  3. Save the MANIFEST.MF file and rebuild the workspace.
  4. Right-click on the LibraryHub > ejbModule folder and create a Java package named com.it.ibm.mdm.workshop.rules.
    Figure 15. New Java package for the external rules
    Image shows new Java package com.it.ibm.mdm.workshop.rules
  5. Right-click on the new package and select New > Class, specifying:
    • Name: CustomPartySuspectSearch
    • Superclass: com.dwl.tcrm.externalrule.PartySuspectSearchRule
    Figure 16. New class CustomPartySuspectSearch
    Image shows new class CustomPartySuspectSearch
  6. Override the execute method of the superclass. See "Customizing the Party Search" for details.
  7. Right-click on the Java package com.it.ibm.mdm.workshop.rules, select New > Class, and specify:
    • Name: CustomPartyMatch
    • Superclass: com.dwl.tcrm.externalrule.PartyMatch
  8. Override the matchPerson method of the superclass. See "Customizing the Party Match" for details.
  9. Create the customized rule for the party update by right-clicking on the Java package com.it.ibm.mdm.workshop.rules, selecting New > Class and specifying the following options:
    • Name: CustomPartyUpdate
    • Superclass: com.dwl.tcrm.externalrule.PartyUpdateExtRule
  10. Optionally, override the mergeAdminContEquivBObj, mergePersonBObj, and mergePersonNameBObj methods of the superclass. See "Customizing the Party Update" for details.

The image below displays the final result.

Figure 17. com.it.ibm.mdm.workshop.rules Java package
Image shows com.it.ibm.mdm.workshop.rules Java package

Customizing the party search

The customized party search is supposed to retrieve the list of all parties satisfying at least one of the matching combinations described in Table 1. These parties will later be evaluated to assign them a matching score. To achieve such behavior, this external rule must leverage a custom search. In our example, the custom search is based on the following SQL statement.

Listing 6. SQL statement for the party search
select 
    PERSON.cont_id 
from 
    PERSON,
    XPERSON,
    PERSONNAME,
    CONTEQUIV
where 
    PERSON.cont_id = XPERSON.xpersonpk_id and 
    PERSON.cont_id = PERSONNAME.cont_id and
    PERSON.cont_id = CONTEQUIV.cont_id and
    CONTEQUIV.admin_sys_tp_cd = 100 and
    (
        ( PERSONNAME.given_name_one = <First Name> and
          PERSONNAME.last_name = <Last Name> )  OR
        ( XPERSON.llibrary_office_code = <Office Id> ) OR
        ( CONTEQUIV.admin_client_id = <Badge Number> )
    )

The input parameters of the select statement are some of the attributes of the incoming master record (i.e., First Name, Last Name, Office Id, and Badge Number). The statement returns the partyIds of all existing database records that match at least one criteria with the incoming master record. MDM will perform this select statement through the SelectCandidatesBObjQuery class extending GenericBObjQuery, and overriding the provideSqlStatement method. The query result set will be collected and presented as business object through the SelectCandidatesBObjResultSet class extending GenericResultSetProcessor. These two Java files are available in the Downloads section and should be copied to the workspace, as described in the picture below.

Figure 18. The com.it.ibm.mdm.workshop.rules Java package
Image shows com.it.ibm.mdm.workshop.rules Java package

The execute method of CustomPartySuspectSearch can be implemented as follows.

Listing 7. execute method of CustomPartySuspectSearch
@Override
public Object execute(Object input, Object componentObject) throws Exception {
                    
    Vector<TCRMPartyBObj> suspectList = new Vector<TCRMPartyBObj>();
                    
    TCRMPartyBObj party = (TCRMPartyBObj) input;
    if (party.getPartyType().equalsIgnoreCase("P")) {
    
    return super.execute(input, componentObject);
    }
                    
    DWLControl control = party.getControl();
    XPersonBObjExt _person = (XPersonBObjExt)party;
    TCRMPersonNameBObj _personName =       
            (TCRMPersonNameBObj)_person.getItemsTCRMPersonNameBObj().get(0);
    TCRMAdminContEquivBObj _personContEquiv = 
            (TCRMAdminContEquivBObj)_person.getItemsTCRMAdminContEquivBObj().get(0);
                    
    SelectCandidatesBObjQuery _candidateSelect = 
        new SelectCandidatesBObjQuery(SelectCandidatesBObjQuery
            .SELECT_CANDIDATES_QUERY_NAME, control);
                    
    _candidateSelect.setParameter(
        SelectCandidatesBObjQuery.SELECT_CANDIDATES_QUERY_FIRSTNAME,
            _personName.getGivenNameOne());
    _candidateSelect.setParameter(
        SelectCandidatesBObjQuery.SELECT_CANDIDATES_QUERY_LASTNAME,
            _personName.getLastName());
    _candidateSelect.setParameter(
        SelectCandidatesBObjQuery.SELECT_CANDIDATES_QUERY_OFFICEID,
            _person.getLlibraryOfficeCode());
    _candidateSelect.setParameter(
        SelectCandidatesBObjQuery.SELECT_CANDIDATES_QUERY_BADGENUMBER,
            _personContEquiv.getAdminPartyId());
                    
    for (String _candidatePartyId : (List<String>)_candidateSelect.getResults()) {
                    
        TCRMPartyBObj _suspectPartyBObj = new TCRMPartyBObj();
        _suspectPartyBObj.setPartyId(_candidatePartyId);
        _suspectPartyBObj.setControl(control);
                    
        suspectList.addElement(_suspectPartyBObj);
    }
                    
    return suspectList;
}

The code listed above is invoking the defined custom query, in order to build a vector of TCRMPartyBObj instances where the partyId is set with the result of the candidate search. The class definition can be found in the CustomPartySuspectSearch.java file in the Downloads section.


Customizing the party match

The customized party match must evaluate the final match score assigned to each suspect found by the search, according to the criteria defined in Table 1. The new Java external rule overrides the matchPerson method of the superclass and checks if the parties correspond according to at least one criteria of those defined in the matching policies. The three methods listed below check whether the attributes match: GivenNameOne and LastName for matchFoundDisplayName, BadgeNumber for matchFoundBadgeNumber, and OfficeId for matchFoundOfficeId, returning a boolean. You can see the detailed logic of these methods in the CustomPartyMatch.java file, which is available in the Downloads section.

Listing 8. Methods called for evaluating matches
boolean _matchFoundDisplayName = matchFoundDisplayName(_partyA, _partyB);
boolean _matchFoundBadgeNumber = matchFoundBadgeNumber(_partyA, _partyB);
boolean _matchFoundOfficeId = matchFoundOfficeId(_partyA, _partyB);

According to the match and non-match criteria you defined, which are those listed in Table 1, each matched attribute contributes to the final score, as follows.

Listing 9. Final score assignment
/*
* display name (first name + last name)
*/
if (_matchFoundDisplayName) {
matchScore += displayNameMatchScore;
} else {			
nonMatchScore += displayNameNonMatchScore;
}
                    
/*
*  badge number
*/
if (_matchFoundBadgeNumber) {			
matchScore += badgeNumberMatchScore;
} else {			
nonMatchScore += badgeNumberNonMatchScore;
}
                    
/*
* office id
*/
if (_matchFoundOfficeId) {			
matchScore += officeIdMatchScore;
} else {			
nonMatchScore += officeIdNonMatchScore;
}

In order to decouple the custom match behavior from the existing score typologies, the matchScore and nonMatchScore variables are incremented with a fixed offset, so to refer to custom score definitions in MDM SDP Code Tables.

Listing 10. Adding the offset
int _customMatchOffset = 3000;
matchScore += _customMatchOffset;
nonMatchScore += _customMatchOffset;

Finally, the matchPerson method returns a TCRMSuspectBObj instance, having match score and non-match scores values as those just evaluated.

Listing 11. Returning a TCRMSuspectBObj
suspect.setMatchRelevencyScore(DWLFunctionUtils.getStringFromInteger(matchScore));
suspect.setNonMatchRelevencyScore(
DWLFunctionUtils.getStringFromInteger(nonMatchScore));
return suspect;

If the final match score corresponds to a suspect type category different from A1, the SDP workflow terminates here and a new row is added to the suspects table, which describes the suspect relationship among two parties.

Figure 19. The MDM Data Model including the Suspects table
Image shows suspects table in the MDM data model

Customizing the party update

If the match score corresponds to the higher suspect type category (A1), the SDP deterministic algorithm has determined that the two parties are definitely the same. The standard behavior of the update policies in this example is not changed, but you are encouraged to apply any customization you wish, by overriding the methods of the PartyUpdateExtRule Java class in the CustomPartyUpdate.java file you can find in the Downloads section. In this article, the CustomPartyUpdate class overrides the standard PartyUpdateExtRule, but does not change anything with respect to the standard behavior.


Registering the custom rules

Finally, in order to register the custom rules you have overridden, you need to perform the following updates to the JAVAIMPL table.

Listing 12. SQL commands to register the custom external rules
UPDATE JAVAIMPL 
    SET JAVA_CLASSNAME='com.it.ibm.mdm.workshop.rules.CustomPartyMatch',
    LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='dwreader'
    WHERE EXT_RULE_IMPL_ID in (1000);
                    
UPDATE JAVAIMPL 
    SET JAVA_CLASSNAME='com.it.ibm.mdm.workshop.rules.CustomPartySuspectSearch',
    LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='dwreader'
    WHERE EXT_RULE_IMPL_ID in (1002);
                    
UPDATE JAVAIMPL 
    SET JAVA_CLASSNAME='com.it.ibm.mdm.workshop.rules.CustomPartyUpdate',
    LAST_UPDATE_DT=CURRENT TIMESTAMP,LAST_UPDATE_USER='dwreader'
    WHERE EXT_RULE_IMPL_ID in (1005);

Conclusion

Suspect Duplicate Processing (SDP) is one of the key features of IBM InfoSphere Master Data Management (MDM). Customizing SDP for a customer's needs requires performing a deep analysis of what's involved in the de-duplication process: the attributes in play, their weight, and the combinations that correspond to each Suspect Type Category. This process has been described in "Analyzing the matching criteria requirements." The analysis step ends with the customization of the MDM database with the attributes' weights and combinations that correspond to each category. You also have to implement the external rules that finalize the customization: the PartySearch to identify, within the MDM database, possible duplicates of the party being added, the PartyMatch to provide a matching score to each suspect, and the PartyUpdate to decide how A1 matches can be automatically collapsed. The external rules customization has been described in "Identifying the suspect search, matching, collapsing, and survivorship policies."


Downloads

DescriptionNameSize
Files for Environmnent Configuration1MDMSDPCustomization_ConfigurationFiles.zip24KB
Java source files2MDMSDPCustomization_JavaSourceFiles.zip5KB

Notes

  1. This file includes the module.mdmxmi file, the Excel file that automates the creation of SQL statements for database configuration, and a SQL file that enables SDP and turns on customized external rules.
  2. This file contains the Java source code used here.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edition and provides a solid base to build and deploy applications.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=829288
ArticleTitle=IBM InfoSphere Master Data Management Suspect Duplicate Processing Customization
publish-date=08092012