Running SDP automation script

To use the SDP feature, you need to create SDP entries for the master catalog. SDP entries are created by using SDP automation script. Multi-occurrence, Grouping, and grouping multi-occurrence attributes are not supported in the SDP feature.

Procedure

You need to be an Admin user to run this script because the script creates Catalog, Hierarchies, Specs, and workflows for the operational catalog.

  1. Log in to the Admin UI.
  2. Go to Data Model Manager > Scripting > Script Sandbox.
  3. In the Script Input Pane field, enter the name of master catalog (in square brackets), username, and password separated by commas, no spaces.
    Format example
    [Catalog1Name,Catalog2Name],Username,Password
    Note: Note: If you are not going to use Automated SDP feature, specify with an empty string ( "").
  4. In the Script Pane field, enter the following automation script Java™ class, and click Run Script.
    //script_execution_mode=java_api="japi://com.ibm.mdm.extensions.sdp.datamodel.ScriptingSandboxGenerateDataModelImpl.class" 

Results

After successful completion of the SDP automation script, following are created:
Table 1. SDP entries
SDP entry Description
Operational Catalog Corresponding to the specified Master catalog. The operational catalog has an SDP suffix. Associates with the Post Save script.
Reference Field Attribute
Reference Field grouping attribute for the primary specs of the master catalog. The attribute has following read-only fields:
MasterID
Contains primary key of the master item for which the current item is a duplicate.
Score
Duplicate score returned by the OpenSearch.
User
Name of the user who requested the SDP operation.
Reference Attribute Collection Contains the Reference Field grouping attribute.
<Container Name> SDP Attribute Collection <Master Catalog Name> SDP Attribute collection. Associates with the primary and secondary spec that is associated to the master catalog.
SDP view
Creates operational catalog view that is called as SDP View. Associates with the following:
  • <Container Name> SDP Attribute Collection
  • Reference Attribute Collection
<Container Name> SDP Collaboration Area The SDP Collaboration Area for each Operational catalog. The source container for this collaboration area is the Operational catalog. Associates with the following:
  • SDP Workflow
  • SDP Step
Master Item Creation script Creates an item Master Catalog during the SDP processing.

Master Item Creation Script gets associated to Operational catalog as a Post Save script. During SDP processing, if an item is marked as No Match then this script create the same item in the Master catalog.

Master Item Creation Script gets associated to SDP Step. This script creates an item in the Master catalog only after the item exists the step.

Associates with the following:
  • SDP Workflow
  • SDP Step
Auto Match Item Script Creates an Auto Match Item Script. This script performs SDP Automation based on the Do_AutoSDP flag and the threshold value in the SDP Container Lookup Table and automatically classifies items as match/no match based on score and threshold value.

The script is attached to the Automated Step. The script is implemented in the IN method and processes items that come out from “Product Enrichment” step.

Lookup Tables Two Lookup tables that are used for the SDP configuration.
  • SDP Container Lookup Table
  • SDP Workflow Step Mapping Lookup Table
Duplicate Delete Report Creates Delete Report that is used to delete No Matchitem present in the Operational catalog. You can run the report to delete duplicate items immediately or you can schedule a job to delete at a specific time.
Table 2. SDP Container Lookup Table
Spec Attribute Description
Master_Catalog The name of the master catalog.
Operational_Catalog The name of the operational catalog.
Enable_SDP Specify to enable SDP. Possible value is True or False. The default value is none.
Matching_Attributes Attributes that help in finding the possible duplicates. The default value is All. The possible value is as follows:
  • All - All attributes are used to identify possible duplicates.

  • Attribute collection name: If duplicate identification is to be based on limited set of attributes, you can provide attribute collection name.

Note: The Matching_Attributes does not support Currency, Number, or Date attributes.
SDP_Response_Size The number of maximum duplicates that can be displayed on the Suspect Duplicate Processing tab. The default value is 3.
Minimum_Should_Match Controls the number of terms that must match. Possible value is a valid integer percentage in the 30 - 90% range. The default value is 30%.
Threshold Minimum value to be considered to compare with score of a duplicate.
Attribute_Weights The name of attributes weight lookup table. If you want to provide weight to the attributes, you need to create lookup table by using SDP Attributes Weight Lookup Spec. This feature helps in matching entry performance.
The attributes that have more weight contribute more in deciding matching entry. Attributes weight support gets applied in the following scenario when both conditions are fulfilled.
  • Matching_Attributes field value must be valid attribute_collection_name.
    If the value is set to the default value of ALL, the attributes weight support is not provided. For more information, see Matching_Attribute field description.
  • Attribute_Weights field value must be provided (Valid lookup table name).
Do_AutoSDP Specify whether you need to automate SDP process. Possible value is true or false. By default, the value is false.
Username Specify your username.
Password Specify your password in plain text.
Table 3. SDP Attributes Weight Lookup Spec
Spec Attribute Description
Attribute_Name The absolute attribute path in the specName/attributeName format. For example, Product Details Spec/Product Id.
Weightage A valid integer value (1 - 100). Similarly, lower the value, lower is the weight age for the respective attribute, higher the value, higher the weight age.
Table 4. SDP Workflow Step Mapping Lookup Table
Spec Attribute Description
Key A valid workflow name.
Value A valid workflow step name where the SDP is processed.
By default, the SDP Workflow and SDP Step are added to the respective lookup table. If you want to add your workflow and step, you should associate primary Item Creation Script your step as follows:
  1. Go to Workflows > Workflow Console.
  2. Select your workflow.
  3. Click <step_name> > Script > Edit.
  4. In the Script Pane field, enter the following automation script Java class, and click Run Script.
    //script_execution_mode=java_api="japi://com.ibm.mdm.extensions.sdp.scripts.ScriptCreateMasterItem.class"

What to do next

You need to complete the following tasks after you have successfully deployed and configured SDP.
  • Check users in the SDP Collaboration Area, SDP Workflow, and SDP Steps. By default, only an Admin user is added as a performer. Admin user needs to then add users or roles who need to perform SDP processing.

  • Ensure that the Master Item Creation Script is selected in the Post-save Script list. If Master Item Creation Script is missing, proceed as follows:
    1. Go to Data Model Manager > Scripting > Script Console.
    2. Select Catalog Script from the list.
    3. Click Edit for the Master Item Creation Script, and then click Save.
    Select the Master Item Creation Script in the Post-save Script list.
  • Check SDP Container Lookup Table for the master catalog lookup entry. Verify all attribute values like master catalog name, operational catalog name, Enable_SDP, matching attributes.
  • Check SDP Workflow Step Mapping Lookup Table and verify an entry having Key=SDP Workflow and Value=SDP Step.
  • Check for Reference Field. Reference Field should be associated to primary spec of the catalog.
  • The Reference Field should not be present in the view of the Master catalog. If present, remove the Reference Field from the attribute collection that is associated with the Master catalog view. Reference Field should be present in the operational catalog view (SDP View). The SDP View should be a default view to the operational catalog.
  • Check the following tabs:
    • The single-edit page for the Master catalog item should have Duplicate tab.
    • The single-edit page for the Operational catalog item should have Suspect Duplicate Processing tab.
  • Open the Report console and schedule the report jobs for deletion for each catalog.