Important happenings in the Kafka community in September 2018
Kafka Summit SF 2018
The next Kafka Summit will happen on October 16 and 17 in San Francisco. The full schedule for all the sessions is already available on the Summit website, and there are still tickets available at the time I’m writing this article.
It will be my third time attending a Kafka Summit—come say Hi if you see me there!
Kafka releases:
On September 10, Dong Lin started the release process for 2.1.0. KIP freeze happened on October 24. The next step—code freeze—should happen mid-October, and, hopefully, there will be a release by the end of October or early November.
As always, the full release plan can be found on the Apache wiki.
KIPs:
In contrast to previous months, the community only submitted 8 KIPs in September (KIP-370 to KIP-378). These are the ones that caught my eye:
KIP-370: Remove Orphan Partitions
Reassigning partitions is a fairly common administrative operation on Kafka clusters. When a broker is down, an administrator can migrate partitions off this broker to ensure availability. In this case, upon restarting, the broker will not delete logs for partitions it’s not a replica for anymore. As a result, this can lead to wasted disk space and it is hard for administrators to track which brokers have “orphaned” logs. Therefore, the goal of this KIP is to enable brokers to automatically detect and delete such logs upon restarting.
KIP-372: Naming Repartition Topics for Joins and Grouping
Currently, Kafka Streams generates names of repartition topics. While this ensures names are unique, it also means that if the topology is updated, topic names can change. This prevents users from simply doing rolling restarts and can also make it hard to map topics to operator when debugging Streams applications. The goal of this KIP is to allow users to name operators (and by extension repartition topics) to allow rolling restart even in some case when the topology changed.
KIP-377: TopicCommand to use AdminClient
This KIP proposes updating the “kafka-topics” tool in order to allow it to use the AdminClient API. While the functionality will remain the same, it will enable users to run this tool in environments where zookeeper is not directly available (it is often the case in Cloud environments).
Blogs:
-
https://multithreaded.stitchfix.com/blog/2018/09/05/datahighway/
-
https://blog.florimondmanca.com/building-a-streaming-fraud-detection-system-with-kafka-and-python
IBM Event Streams for Cloud is Apache Kafka-as-a-service for IBM Cloud.