Processing Kafka messages
Apache Kafka is an open source project that provides a messaging service capability, based upon a distributed commit log, which lets you publish and subscribe data to streams of data records (messages). IBM® App Connect Enterprise provides built-in input and output nodes for processing Kafka messages. A mid-flow node is also provided for reading individual messages on a Kafka topic.
The Kafka messaging protocol is a TCP based protocol that provides a fast, scalable, and durable method for exchanging data between applications. Messaging providers are typically used in IT architectures in order to decouple the processing of messages from the applications that produce them. IBM App Connect Enterprise has very strong existing support for many kinds of messaging, including IBM MQ, JMS messaging providers, and the MQTT messaging protocol. The Kafka nodes expand this support to help IBM App Connect Enterprise interact with Kafka applications.
For information about the supported versions of Kafka, see IBM App Connect Enterprise system requirements. For more information about Kafka version compatibility, see the Apache Kafka documentation.
Kafka is a very popular choice for cloud-based architectures, where the numbers of connected clients can change frequently without impacting the scaling characteristics. Common use-cases where Kafka is considered as a messaging transport include:
- General messaging
- You can decouple producer and consumer applications from each other. Data flowing between multiple processing stages in a distributed application could use Kafka as a means of connecting the steps. The built-in features that Kafka provides for partitioning, replication, and fault tolerance can make it a good choice for this type of large-scale messaging.
- Web site activity tracking
- When Kafka was first developed, it was used for helping to track page views, searches, or other actions taken on a web site. This activity was published to a set of central topics for different activity types. Subscribers could then use the data for real-time processing and monitoring, and persisting to other data warehouse systems.
- Logging and metrics
- You can use Kafka to aggregate operational data from multiple sources.
- KafkaProducer node, which publishes messages to a Kafka topic
- KafkaConsumer node, which subscribes to a Kafka topic and propagates the feed of published messages to nodes connected downstream in the flow
- KafkaRead node, which reads a specified message from a Kafka topic.
Each partition is an ordered sequence of messages, whose state cannot be altered after they have been created. Each of the messages in a partition is assigned a sequential ID number, called the offset, which uniquely identifies each message in the partition. These messages are all retained for a configurable period of time, regardless of whether they have been consumed by another application, after which the messages are discarded to free up space. This approach to message queuing is very different from the approach of traditional messaging products such as IBM MQ.
Message ordering is preserved only within a partition, not across all topics in a partition; therefore, if message order is important, ensure that you either use a single partition per topic, or associate a message key with each message published. A hash of the key for a message is used to select the partition to which the message is sent, so all messages published with the same key are stored on the same partition.
For more information about Kafka messaging, see the Apache Kafka documentation.
- Using Kafka with IBM App Connect Enterprise
- Producing messages on Kafka topics
- Consuming messages from Kafka topics
- Reading an individual message from a Kafka topic
- Configuring security credentials for connecting to Kafka
- Using local environment variables with Kafka nodes
- Setting and retrieving Kafka custom header properties
- Using Kafka nodes with IBM Event Streams
- Authenticating connections to a Kafka cluster by using SASL/SCRAM
- Resolving problems when using Kafka nodes