Apache Kafka (Kafka) is an open-source, distributed streaming platform that enables (among other things) the development of real-time, event-driven applications and user experiences on the web.
Today, billions of data sources continuously generate streams of data records, including streams of events. An event is a digital record of an action that happened and the time that it happened. Typically, an event is an action that drives another action as part of a process. A customer placing an order, choosing a seat on a flight or submitting a registration form are all examples of events. An event doesn’t have to involve a person—for example, a connected thermostat’s report of the temperature at a given time is also an event.
These streams offer opportunities for applications that respond to data or events in real-time. A streaming platform enables developers to build applications that continuously consume and process these streams at extremely high speeds, with a high level of fidelity and accuracy based on the correct order of their occurrence.
LinkedIn developed Kafka in 2011 as a high-throughput message broker for its own use, then open-sourced and donated Kafka to the Apache Software Foundation (link resides outside ibm.com). Today, Kafka has evolved into the most widely used streaming platform, capable of ingesting and processing trillions of records per day without any perceptible performance lag as volumes scale. Fortune 500 organizations such as Target, Microsoft, AirBnB and Netflix rely on Kafka to deliver real-time, data-driven experiences to their customers.
Kafka is a distributed platform; it runs as a fault-tolerant, highly available cluster that can span multiple servers and even multiple data centers. Kafka topics are partitioned and replicated in such a way that they can scale to serve high volumes of simultaneous consumers without impacting performance. As a result, according to Apache.org, “Kafka will perform the same whether you have 50 KB or 50 TB of persistent storage on the server."
RabbitMQ is a very popular open-source message broker, a type of middleware that enables applications, systems and services to communicate with each other by translating messaging protocols between them.
Because Kafka began as a kind of message broker (and can, in theory, still be used as one) and because RabbitMQ supports a publish/subscribe messaging model (among others), Kafka and RabbitMQ are often compared as alternatives. But, the comparisons aren’t really practical, and they often dive into technical details that are beside the point when choosing between the two. For example, Kafka topics can have multiple subscribers, whereas each RabbitMQ message can have only one; or Kafka topics are durable, whereas RabbitMQ messages are deleted once consumed.
Kafka is frequently used with several other Apache technologies as part of a larger streams processing, event-driven architecture or big data analytics solution.
Apache Spark is an analytics engine for large-scale data processing. You can use Spark to perform analytics on streams delivered by Apache Kafka and to produce real-time stream processing applications, such as the aforementioned click-stream analysis.
Apache NiFi is a data flow management system with a visual, drag-and-drop interface. Because NiFi can run as a Kafka producer and a Kafka consumer, it’s an ideal tool for managing data flow challenges that Kafka can’t address.
Apache Flink is an engine for performing computations on event streams at scale, with consistently high speed and low latency. Flink can ingest streams as a Kafka consumer, perform operations based on these streams in real time and publish the results to Kafka or to another application.
Apache Hadoop is a distributed software framework that lets you store massive amounts of data in a cluster of computers for use in big data analytics, machine learning, data mining and other data-driven applications that process structured and unstructured data. Kafka is often used to create a real-time streaming data pipeline to a Hadoop cluster.
Experience IBM API Connect with a free trial or connect with our experts to discuss your needs. Whether you’re ready to optimize your API management or want to learn more, we’re here to support your digital transformation.
Discover the full potential of your integration processes with AI-powered solutions. Schedule a meeting with our experts or explore our product documentation to get started.
Supercharge your business with IBM MQ secure, high-performance messaging solutions. Start your free trial or connect with our experts to explore how IBM MQ can transform your operations.
Experience faster, more secure file transfers—any size, any distance. Try IBM Aspera today and streamline your data workflows with high-speed efficiency.
Integrate your applications and automate work with hybrid multicloud platform IBM webMethods.
Unlock business potential with IBM integration solutions, connecting applications and systems to access critical data quickly and securely.
Unlock new capabilities and drive business agility with IBM’s cloud consulting services. Discover how to co-create solutions, accelerate digital transformation, and optimize performance through hybrid cloud strategies and expert partnerships.
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com