As enterprises face increasingly complex IT and data environments, they rely on messaging and streaming platforms to ensure fast, reliable data exchange across applications, systems and services. With real-time data analytics becoming a critical driver of actionable insights, accelerating the speed of data streaming and data processing is a top priority. According to 2025 IDC data, surveyed enterprises indicate that 63% of use cases must process data within minutes to be useful.
Pulsar combines the features of traditional messaging systems with those of publish/subscribe systems, making it uniquely suited for use cases such as microservices, instant messaging and data integration. A host of capabilities and advantages enable Pulsar’s versatility, including geo-replication, multi-tenancy and tiered storage.
Originally developed at Yahoo and open-sourced by the Apache Software Foundation in 2016, Apache Pulsar now manages hundreds of billions of events per day across major organizations.
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Understanding the significance of Apache Pulsar begins with a clear view of how messaging and event streaming platforms work.
A message is a packet of data that applications create for other applications to use. These packets are used in the order in which they are transmitted, until the consuming application processes them.
Messaging systems facilitate the exchange of those messages. Traditional messaging systems are middleware solutions (also called message-oriented middleware, or MOM). These solutions commonly support two message distribution patterns: point-to-point messaging and publish/subscribe messaging.
In point-to-point messaging, one application (called the sender) submits a message to what’s known as a message queue, which stores the message. Then, another application (called the receiver or consumer) receives the message from the queue and processes it. Each message should be consumed only once.
In publish/subscribe messaging, or pub/sub messaging, the application that produces the message is called a publisher. The applications that use it are referred to as subscribers. Each message is published to a category known as a topic, and every application that subscribes to that topic receives a copy of all messages that are published to it.
Partitions and partitioned topics can accelerate message processing. Messages published to partitioned topics are distributed among multiple brokers.
Pub/sub messaging is designed for broadcast-style, “one to many” communication. Point-to-point messaging—as its name implies—exchanges information between a single sender and a single receiver.
Among traditional messaging systems, RabbitMQ, an open source platform, is often cited as the most popular.
An event streaming platform captures real-time data from applications, databases and IoT devices. It then transports the data to various destinations for immediate processing, analytics or storage.
Known for their scalability, event streaming platforms can order streams of records into topics and store them for a predetermined amount of time. Unlike traditional messaging systems, however, event streaming platforms cannot guarantee message delivery or track which consumers have received messages. They rely on pub/sub messaging rather than point-to-point messaging distribution and offer less flexibility on message routing.
Among event streaming platforms, Apache Kafka is the most widely used.
Apache Pulsar combines the capabilities of platforms like RabbitMQ and Apache Kafka into a single solution. It can stream events and deliver messages to multiple consumers like Kafka; it supports queuing and can send messages to single consumers like RabbitMQ.
But Pulsar is more than just the sum of its predecessors. Yahoo initially developed the platform to address its own organizational needs, so certain competitive advantages were built in from the start. In the years since, other improvements have further enhanced Pulsar as a high-performance messaging and streaming platform.
Today, some of the most compelling features of Apache Pulsar include:
Multi-tenancy was one of the original features that differentiated Apache Pulsar from other platforms. In multi-tenant software architecture, a single instance of a software application (and its underlying database and hardware) serves multiple tenants (or user accounts). The benefits of multi-tenancy include simplified system setup, configuration, maintenance, and application deployment as well as cost savings.
In Apache Pulsar, different teams can safely share the messaging system. Each tenant has its own authentication, authorization and policies. Tenants can be further divided into what are known as namespaces (logical groupings of topics). This division makes it easy to support different environments—such as development, staging and production—within a single tenant.
Replicating messages to remote locations is important to support disaster recovery or to enable applications to operate on a global scale. Unlike other platforms, Pulsar doesn’t require complex configurations or add-ons to power this capability.
With geo-replication, applications can connect to the local Pulsar cluster and still send to clusters around the world. If a producer publishes a message to a topic in a replicated namespace, that message is automatically replicated to the configured remote geo-location or locations.
Pulsar architecture separates message delivery components (message brokers) and message storage layers. Messages are stored by Apache BookKeeper, a known leader in durable log storage solutions.
To enhance performance, BookKeeper distributes data across multiple servers known as bookies. (Metadata for BookKeeper ledgers is stored in Apache ZooKeeper.) Bookies can be added as necessary, resulting in horizontal scalability suitable for handling large data volumes. This architecture enables Pulsar to provide low latency while also transferring large amounts of data in a short amount of time—what’s known as high throughput.
Pulsar’s architecture is also considered cloud-native architecture. Both Pulsar and cloud computing separate compute from storage. In addition, Pulsar can be deployed on Kubernetes, an open source container orchestration platform that’s a building block of modern cloud infrastructure.
Apache Pulsar also features tiered storage. This capability allows older backlog data to be moved from Apache BookKeeper to less expensive, long-term storage, while still allowing Pulsar clients to access the backlog.
Pulsar tiered storage uses Apache jclouds (an open source multi-cloud toolkit for the Java platform) to support long-term storage through solutions such as AWS S3 (Amazon S3), GCS (Google Cloud Storage), Azure and Aliyun.
Apache Pulsar can be easily used with external systems thanks to Pulsar IO connectors. These connectors act as bridges between Pulsar and other systems, such as stream-processing engines, data pipeline APIs and other messaging platforms.
Pulsar connectors come in two types: source and sink. Source connectors transmit data from external systems to Pulsar, while sink connectors do the opposite, transmitting data from Pulsar to external systems. Commonly used Pulsar connectors include MySQL, MongoDB, Cassandra, RabbitMQ, Kafka, Flume and Redis.
Apache Pulsar supports four different subscription types1 to help users configure messaging patterns:
Other notable Pulsar features include:
Broker load balancing: Pulsar monitors the CPU, memory and network usage of Pulsar brokers and moves workloads as necessary to optimize balance and avoid overloading individual brokers.
Schema registry: Pulsar’s schema registry enables Pulsar clients to upload data schemas on a per-topic basis to ensure that producers and consumers use compatible message formats.
Client libraries: Client libraries are pre-built functions and procedures that simplify interactions between applications and APIs, databases and services. Pulsar supports programming language-specific libraries (including libraries for Java, C++, Python and Node.js) and language-agnostic libraries (REST and WebSocket).
Message retention: Traditional systems delete messages once they’ve been consumed. Pulsar allows users to set retention policies to store messages even after they’ve been consumed—a feature that can support event-driven architecture models.
The following use cases from the Apache Software Foundation help illustrate Apache Pulsar’s utility and versatility.2
Apache Pulsar is often the platform of choice when enterprises decide to pursue messaging technology consolidation because it supports multiple messaging use cases (including message queuing and streaming) and multi-tenancy, enabling multiple teams to use it in the ways that serve them best.
Enterprises seeking to minimize the chance of data loss for critical applications—such as financial transactions—can leverage Apache Pulsar for its failure resiliency features. Namely, messages fed through Pulsar are replicated to several BookKeeper nodes. In addition, messages don’t get lost even when hardware fails.
Apache Pulsar supports the constant communication required between microservices through indirect API calls. Services can send messages to topics to which other services are subscribed. While other messaging systems can provide this capability, Pulsar’s horizontal scalability sets it apart—the platform can scale up within minutes to accommodate large influxes of requests.
Messaging systems commonly support task queues—systems that organize asynchronous execution of background jobs without hindering an application’s performance. Apache Pulsar supports task queue systems through its shared subscription—which distributes messages to multiple consumers—and message acknowledgment capability, which confirms a task’s completion.
1 "Messaging." The Apache Software Foundation. Retrieved 4 August 2025.
2 "Pulsar Use Cases." The Apache Software Foundation. Retrieved 4 August 2025.