Exporting physical MDM data using Kafka

InfoSphere® MDM includes an MDM data export tool that leverages Apache Kafka streaming. You have the option of using either a runtime or a batch stream processor.

Apache Kafka is a distributed streaming platform that enables you to publish, subscribe to, and process streams of records as they occur. Kafka is used for building real-time data pipelines and stream applications to offload and process data. It is horizontally scalable, fault-tolerant, fast, proven, and accepted widely in the industry.

Note: For more information about Kafka, see https://kafka.apache.org/
Important: If you install InfoSphere MDM 11.6.0.11 or above, Apache Kafka is installed as part of the InfoSphere MDM installation. The Kafka Processor assets can be found in the folder <MDM_INSTALL_HOME>/KafkaProcessor.

After installation, you can enable InfoSphere MDM 11.6.0.11 to support more recently released versions of Kafka than the one included in the base InfoSphere MDM installation. To enable support for Kafka 2.6, 2.7, or 2.8, complete some important configuration changes. For more information, see Configuring InfoSphere MDM to support Apache Kafka 2.6, 2.7, or 2.8.

The Kafka integration in InfoSphere MDM enables physical MDM implementations that have Kafka Channels, runtime stream processors, and batch stream processors.
  • Use a combination of the Kafka Channel and runtime stream processors for real-time data synchronization.
  • Use the batch stream processor to synchronize InfoSphere MDM data directly by connecting to the MDM database and synchronizing MDM data in batch mode.