Kafka Monthly Digest: June 2019

In this post, I will cover what happened in the Kafka Community in June 2019.

After three Release Candidates, Colin McCabe released Apache Kafka 2.3.0 on June 25. This new minor version brings a number of interesting features:

The new IncrementalAlterConfigs API allows you to update only the desired configurations. The AlterConfigs API, which requires you to always specify all configurations, is now deprecated. (KIP-339)
The fairness of network processors has been improved and now they prioritize existing connections over incoming ones. This helps scenarios where a massive amount of new connections would significantly impact brokers availability. (KIP-402)
Brokers are now able to start faster due to log loading optimizations. (KAFKA-7283)
The replication protocol has been hardened. (KIP-461)
The monitoring of under-replicated partitions has been improved via new metrics and a new flag for the kafka-topics tool. (KIP-351, KIP-427)

Clients can now easily find which operations they are authorized to perform via the describe methods of the Admin API. (KIP-430)
All clients can now use external configurations that are resolved automatically at runtime. (KIP-421)
The rebalancing mechanism of Kafka Connect has been improved to be incremental and cooperative. This will reduce the amount and duration of rebalances. (KIP415)
Kafka Streams now has built-in in-memory window and session stores. (KIP428, KIP445)

As always, the full release notes are available on apache.org and the release plan is on the wiki.

Last month, the community submitted nine KIPs (KIP-477 to KIP-485), and these are the ones that caught my eye:

KIP-480: Sticky Partitioner: At the moment, when producing records without keys, Producers assign partitions in a round-robin fashion. While this spreads records evenly across partitions, in many cases, it’s not using the network optimally since batches are often sent before being full. The goal of this KIP is to favour filling up batches by only changing the target partition after each batch, instead of after each record.
KIP-482: The Kafka Protocol should Support Optional Fields: The current Kafka protocol specification defines fixed payloads for all the protocol messages. Every time a field is added in a message, a new message version is defined. However, even if a field is not set, it needs to be included and takes some bytes. This KIP’s goal is to make the protocol more flexible and efficient by supporting optional fields.
KIP-484: Expose metrics for group and transaction metadata loading duration: When the coordinator for a group changes (due to a broker restarting, for example), the new coordinator has to load the group metadata from the __consumer_offsets partitions. This can take some time, during which groups are idle. This KIP proposes adding metrics to track the duration of such reloads so it’s easy to identify when this happens and how long it takes.

IBM Event Streams for Cloud is Apache Kafka-as-a-service for IBM Cloud.

Software Engineer