Home
Analytics
Apache Zookeeper
ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.
If you had a Hadoop cluster spanning 500 or more commodity servers, you would need centralized management of the entire cluster in terms of name, group and synchronization services, configuration management, and more. Other open source projects using Hadoop clusters require cross-cluster services. Embedding ZooKeeper means you don’t have to build synchronization services from scratch. Interaction with ZooKeeper occurs by way of Java™ or C interface time.
For applications, ZooKeeper provides an infrastructure for cross-node synchronization by maintaining status type information in memory on ZooKeeper servers. A ZooKeeper server keeps a copy of the state of the entire system and persists this information in local log files. Large Hadoop clusters are supported by multiple ZooKeeper servers, with a master server synchronizing the top-level servers.
Within ZooKeeper, an application can create what is called a znode, which is a file that persists in memory on the ZooKeeper servers. The znode can be updated by any node in the cluster, and any node in the cluster can register to be notified of changes to that znode.
Put simply, applications can synchronize their tasks across the distributed cluster by updating their status in a ZooKeeper znode. The znode then informs the rest of the cluster of a specific node’s status change. This cluster-wide status centralization service is critical for management and serialization tasks across a large distributed set of servers.
IBM and Cloudera have partnered to offer an industry-leading, enterprise-grade Hadoop distribution, including an integrated ecosystem of products and services to support faster analytics at scale.
Zookeeper's simple architecture makes it easier for you to implement typical coordination tasks like electing a master server, managing group membership and managing metadata in distributed environments.
Use ZooKeeper for maintaining centralized configuration information, naming, synchronizing and managing group services in a simple interface without writing them from scratch.
Zookeeper stores and mediates updates to important configuration information for distributed applications in a reliable, fast and ordered manner.
Hadoop uses ZooKeeper for automatic fail-over of Hadoop HDFS Namenode and the high availability of YARN ResourceManager.
HBase uses ZooKeeper for main controller election, lease management of region servers and other communication between region servers.
Cloudera Search uses ZooKeeper for centralized configuration management to integrate search functionality with Hadoop through Apache Solr.