Blog Post by : Davendra Paltoo, Offering Manager, Data Replication
Follow Davendra on Twitter at : https://twitter.com/Davendr18397388
Real time analytics can provide real time insights. When businesses have data at the right time, they can be more efficient and make the right tactical and strategic decisions. Apache Kafka® is used for building Data hubs or landing zones, building real-time data pipelines, and streaming apps. It is horizontally scalable, fault-tolerant, wicked fast, and runs in production in thousands of companies.
The Apache Kafka (https://kafka.apache.org/) platform now has a vibrant associated eco-system.
When messages are landed in a Kafka cluster, there are a variety of available connectors or consumers that can in turn retrieve messages and deliver such messages to target destinations such as HDFS and Amazon S3. Users can save time and costs by using one of these available consumers, such as those listed on https://docs.confluent.io/current/connect/connectors.html or https://community.hortonworks.com/topics/Kafka.html
IIDR (CDC), with the initial version of its 11.4.0 release, provided users the ability to replicate from any supported CDC Replication source to a Kafka cluster by using the IIDR (CDC) target Replication Engine for Kafka. This engine writes Kafka messages that contain the replicated data to Kafka topics. The replicated data in the Kafka messages is by default written in the Avro binary format. Consumers that want to read these messages from Kafka clusters needed to utilize an Avro binary deserializer.
Users on the other hand, desired more flexibility in the IIDR (CDC) Kafka apply so that it could help users produce the various permutations of data formats of messages written to Kafka expected by the wide variety of off the shelf, custom connectors and consumer applications.
To help customers solve this challenge, IIDR (CDC) has recently introduced support for "Kafka custom operation processors" (KCOP) to improve the flexibility of message delivery into Kafka. Customers can make use of a number of integrated predefined output formats, or adapt these user exits to define their own custom formats, what data is included in the message payload, and more. Apart from giving users the flexibility of defining the message formats and payloads, users are now also able to specify the Kafka topic names for their message destinations, and much more.
Moreover, as more common customer needs are assessed IBM will add more such predefined output formats.
For more information on the KCOP and samples available, please see the IIDR (CDC) knowledge center:
For demo videos on how to make use of the sample KCOPs or to contribute to the IBM Data replication community, please see the replication developer works page: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W8d78486eafb9_4a06_a482_7e7962f5ac59/page/Replication%20How%20To%20Videos
For more information on how IBM Data Replication can provide near real-time incremental delivery of transactional data to your Hadoop and Kafka based Data Lakes , visit : https://www.ibm.com/analytics/data-replication