Processing Kafka messages
Apache Kafka is an open source project that provides a messaging service capability, based upon a distributed commit log, which lets you publish and subscribe data to streams of data records (messages). IBM® Integration Bus provides built-in input and output nodes for processing Kafka messages.
The Kafka messaging protocol is a TCP based protocol that provides a fast, scalable, and durable method for exchanging data between applications. Messaging providers are typically used in IT architectures in order to decouple the processing of messages from the applications that produce them. IBM Integration Bus has very strong existing support for many kinds of messaging, including IBM MQ, JMS messaging providers, and the MQTT messaging protocol. The Kafka nodes expand this support to help IBM Integration Bus interact with Kafka literate applications.
Kafka is a very popular choice for cloud-based architectures, where the numbers of connected clients can change frequently without impacting the scaling characteristics. Common use-cases where Kafka is considered as a messaging transport include:
- General messaging
- You can decouple producer and consumer applications from each other. Data flowing between multiple processing stages in a distributed application could use Kafka as a means of connecting the steps. The built-in features that Kafka provides for partitioning, replication, and fault tolerance can make it a good choice for this type of large-scale messaging.
- Web site activity tracking
- When Kafka was first developed, it was used for helping to track page views, searches, or other actions taken on a web site. This activity was published to a set of central topics for different activity types. Subscribers could then use the data for real-time processing and monitoring, and persisting to other data warehouse systems.
- Logging and metrics
- You can use Kafka to aggregate operational data from multiple sources.
- KafkaConsumer node, which subscribes to a Kafka topic and propagates the feed of published messages to nodes connected downstream in the flow
- KafkaProducer node, which publishes messages to a Kafka topic.
Each partition is an ordered sequence of messages, whose state cannot be altered after they have been created. Each of the messages in a partition is assigned a sequential ID number, called the offset, which uniquely identifies each message in the partition. These messages are all retained for a configurable period of time, regardless of whether they have been consumed by another application, after which the messages are discarded to free up space. This approach to message queuing is very different from the approach of traditional messaging products such as IBM MQ.
Message ordering is preserved only within a partition, not across all topics in a partition; therefore, if message order is important, ensure that you either use a single partition per topic, or associate a message key with each message published. A hash of the key for a message is used to select the partition to which the message is sent, so all messages published with the same key are stored on the same partition.
For more information about Kafka messaging, see the Apache Kafka documentation.