What is Apache ZooKeeper?

ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems.  The goal is to make these systems easier to manage with improved, more reliable propagation of changes.

Apache ZooKeeper logo

How does ZooKeeper work?

If you had a Hadoop cluster spanning 500 or more commodity servers, you would need centralized management of the entire cluster in terms of name, group and synchronization services, configuration management, and more. Other open source projects using Hadoop clusters require cross-cluster services. Embedding ZooKeeper means you don’t have to build synchronization services from scratch. Interaction with ZooKeeper occurs by way of Java™ or C interface time.

For applications, ZooKeeper provides an infrastructure for cross-node synchronization by maintaining status type information in memory on ZooKeeper servers. A ZooKeeper server keeps a copy of the state of the entire system and persists this information in local log files. Large Hadoop clusters are supported by multiple ZooKeeper servers, with a master server synchronizing the top-level servers.

Within ZooKeeper, an application can create what is called a znode, which is a file that persists in memory on the ZooKeeper servers. The znode can be updated by any node in the cluster, and any node in the cluster can register to be notified of changes to that znode.

Put simply, applications can synchronize their tasks across the distributed cluster by updating their status in a ZooKeeper znode. The znode then informs the rest of the cluster of a specific node’s status change. This cluster-wide status centralization service is critical for management and serialization tasks across a large distributed set of servers.

Drive better, faster analytics with Hadoop solutions from IBM

IBM and Cloudera have partnered to offer an industry-leading, enterprise-grade Hadoop distribution, including an integrated ecosystem of products and services to support faster analytics at scale.

Engage with an expert

Schedule a no-cost, one-on-one call with an IBM big data expert to learn about how we can help you extend data science and machine learning across the Apache Hadoop ecosystem.