Streaming record and entity data changes (IBM Match 360)

Configure a master data event stream to propagate changes in your record and entity data directly to downstream systems through a connected Apache Kafka server.

Stream record and entity data changes to ensure that your users and systems always have the most up-to-date master data. You can achieve near real-time data synchronization between IBM Match 360 and different endpoints by using the master data event streaming capability and an Apache Kafka connection. Typical target endpoints are downstream systems that need to synchronize the trusted golden view of mastered data for different analytical and business use cases.

To achieve event streaming, IBM Match 360 uses IBM Cloud Pak for Data connection services. Specifically, it uses the Apache Kafka connection asset to connect to external Kafka clusters and IBM Event Streams. IBM Match 360 supports all of the Kafka variants that the Apache Kafka connection supports. For more information about the Apache Kafka connection, see Apache Kafka connection.

Restriction: Master data streaming is available only through the IBM Match 360 API. Master data event streaming capabilities do not support bulk data load jobs.

IBM Match 360 supports streaming through Apache Kafka to your target endpoints. You cannot stream data from a external source system to IBM Match 360 through the same mechanism. However, you can achieve a custom inbound data streaming configuration by using DataStage, an Apache Kafka connection, and the IBM Match 360 ongoing synchronization API methods. For more information about this configuration, see the external blog post Real time data ingestion to IBM Match 360 using core platform capabilities.

In this topic:

For more details about master data streaming events, such as message templates and examples, see Master data event streaming message template.

Types of master data events that can be streamed

Downstream systems often need to synchronize their data with the most current master data provided by IBM Match 360. By using the event streaming capability, you can subscribe to data change events at the entity or record level.

IBM Match 360 is a live system. Any time a record is added, updated or deleted, a record change event is created. When the matching engine runs, record changes are included in the matchmaking process and can affect entities. Any time matching gets run, IBM Match 360 creates entity change events to capture newly created entities, changes to the membership of existing entities, or updates to an entity's composite attribute values.

Changes to entity data that trigger streaming events

The member records of a master data entity can change when:

  • Matching gets run after a change in record data (add, update, or delete).
  • Matching gets run after a data engineer updates the matching algorithm configuration.
  • A data steward manually links or unlinks records.

The attribute values of a master data entity can change when:

  • A data steward manually updates an entity attribute value.
  • There are changes to the entity's member records' attributes that cause different values to be selected by attribute composition rules. For more information about attribute composition rules, see Defining attribute composition rules.

An entity can be deleted when it no longer has any member records.

Changes to record data that trigger streaming events

Record data changes when:

  • Records are added to IBM Match 360.
  • Records are updated.
  • Records are deleted.

If you create a streaming subscription for entity change events, the underlying record change events are also included in the stream. However, there are some scenarios where you might want to stream only record data, such as:

  • If your custom data model includes record types that aren't associated with any entity types.
  • If your entity streaming subscription includes a source level filter to include or exclude certain entity types.
  • If you want to process record change events separately from entity change events by sending them to a different Kafka topic.

Configuring master data event streaming

Configure a new streaming subscription to start synchronizing your master data entities and records with downstream systems.

The master data event streaming capability supports the following connection types and security types.

Supported connection types
Connection type Security type
Apache Kafka and other vendor-specific variants of Kafka None, SSL, SASL_SSL, SCRAM-256, SCRAM-512
IBM Event Streams SASL_SSL, SCRAM-256, SCRAM-512

To enable the IBM Match 360 master data event streaming capability:

  1. Create a Cloud Pak for Data Apache Kafka connection asset in a project or catalog:

    a. Create a project or catalog to handle all of your connection assets. For information about creating a project, see Creating a project. For information about creating a catalog, see Creating a catalog.

    b. In your project or catalog, go to the Manage tab. In the General section, copy the project ID a project or catalog ID for a catalog. You'll need this ID to use as the container ID when creating your streaming subscription.

    c. Go to the Assets tab, and click New asset > Connection.

    d. From the list of connectors, click Apache Kafka, then click Select.

    e. Enter the Kafka connection information such as a name, description, and target Kafka server host name. Ensure that the following configuration items are set correctly:

    • Set the Credentials option to Shared.
    • Disable the Mask sensitive credentials retrieved through API calls option.
    • Select a supported connection type and security type for your connection.
  2. Get the Apache Kafka connection asset ID and your project or catalog ID (container ID). You need these IDs to use as input when creating a master data event streaming subscription. There are two ways you can get these IDs:

    • From the asset URL: Open the Apache Kafka connection asset that you created your project or catalog. Copy the asset ID and container ID from the URL bar in your browser. Refer to the following example: https://cpd-namespace.apps.samplecp4d.cp.example.com/connections/<ASSET ID>?project_id=<CONTAINER ID>&context=icp4data

    • By using the connection API: Run the following API curl command:

      curl --location --request GET 'https://api.dataplatform.cloud.ibm.com/v2/connections?limit=100&entity.datasource_type=f13bc9b7-4a46-48f4-99c3-01d943334ba7&project_id=xxx&userfs=false' --header 'Accept: application/json' --header 'Authorization: Bearer xxx’
      
  3. Use the IBM Match 360 API to create your master data streaming subscription. Use the following API methods in the model-ms microservice to create, update, or delete a subscription:

  • Create method: GET /mdm/v1/event_subscription
  • Update method: PUT /mdm/v1/event_subscription
  • Delete method: DELETE /mdm/v1/event_subscription

For information about using the IBM Match 360 API, see IBM Match 360 API reference documentation.

Example event_subscription payload:

   {
  "filter": ["person_entity"],
  "event_type": "entity_change",
  "created_user": "user123",
  "last_update_user": "user123",
  "stream_connection": {
    "stream_type": "Kafka",
    "asset_scope": "Project",
    "topic": "PersonEntityTopic",
    "asset_id": "<ASSET-ID>",
    "container_id": "<CONTAINER-ID>"
  },
  "subscription_description": "Create PersonEntityRecordSub event  subscription SASL SSL EventStream",
  "subscription_name": "PersonEntityRecordSubSASLSSLEventStream",
  "active": true,
  "created_date": "1680297619428",
  "last_update_date": "1680297619428"
}
Details of the event_subscription API payload
Parameter Value
filter The filter for the selected event type. This filters the valid entity types or record types, depending on the value of the event_type parameter.
event_type The type of event being streamed in this subscription. Supported values are record_change or entity_change.
created_user The user who created this subscription.
last_update_user The user who most recently updated this subscription.
stream_connection.stream_type The supported stream type. The valid value is Kafka.
stream_connection.asset_scope Defines whether the Kafka connection is scoped to a Project or Catalog.
stream_connection.topic The name of the Kafka topic to which the events are published.
stream_connection.asset_id The asset ID of the Apache Kafka connection asset.
stream_connection.container_id The container ID of the Apache Kafka connection asset. This is the project ID or catalog ID.
subscription_description A description of this subscription.
subscription_name The name of this subscription.
active The indicator of whether this subscription is active. If set to true, the events are streamed. If set to false, no events are streamed.
created_date The date this subscription was created.
last_udate_date The date this subscriptino was most recently updated.

Enabling logging for master data event streaming

To enable WebSphere Liberty Profile logging for the IBM Match 360 event streaming capability:

  1. From the OpenShift console, go to Administration > CustomResourceDefinition. Search for and select MasterDataManagement (CRD).

  2. Select the Instance tab, then select mdm-cr and open the YAML tab.

  3. To enable logging parameters, add the following lines to the mdm-cr YAML:

    wlp:
     logging:
       trace:
         specification: "com.ibm.mdmx.common.events.*=all:com.ibm.entity.matching.operational.core.streaming.*=all"
    
  4. Click Save, then reload the page to ensure that the parameters were added correctly.

    After enabling logging, it takes some time for the mdm-cr to get resolved. Wait for the Cloud Pak for Data instance to show as enabled.

  5. Review the logs to check whether IBM Match 360 successfully sends event messages to your Kafka endpoint.

Learn more

Parent topic: Configuring master data