Apache Zookeeper

Apache Zookeeper is an open-source project providing a centralized configuration service and naming registry for large distributed systems. In essence, Zookeeper is a service layer on your cluster that serves as a single point of management for distributed applications, enabling you to streamline application management. Note, however, that Zookeeper is meant for use by application developers, rather than by administrators.

About this task

Ensure that the MapReduce framework in IBM® Spectrum Symphony is set to use Zookeeper. For the supported versions of Hadoop see Supported distributed files systems for MapReduce or YARN integration. For the supported versions of Zookeeper that the MapReduce framework in IBM Spectrum Symphony has been qualified with, see Supported third-party applications for MapReduce.

Procedure

  1. Download and install the latest stable version of Zookeeper.

    For information on installing Zookeeper, refer to the Zookeeper documentation.

  2. Once you have extracted the Zookeeper installation file on all the hosts that you want to install the application on, create the configuration file required to start Zookeeper in the /conf directory; for example, /zookeeper-3.4.6/conf/zoo.cfg.
  3. Edit the zoo.cfg file to provide values for the following properties:
    • tickTime—Specifies the duration (in milliseconds) at which Zookeeper checks the status of the hosts. For example:
      tickTime=2000
      
    • dataDir—Specifies the directory to store the in-memory database. If this directory does not exist, create it and ensure that the user has read-write permissions. For example:
      dataDir=/admin/zookeeper-3.4.6/data
      
    • clientPort—Specifies the port that the Zookeeper client listens on for connections. For example:
      clientPort=2181
      
    • server.n—(Optional) Specifies host names and ports for management servers in order of failover if you have replicated servers, where:

      n identifies the main management server, followed by other servers in the order of priority for failover. For example:

      server.1=dbhost1:2888:3888
      server.2=dbhost2:2888:3888
      
  4. Run Zookeeper from its home directory:

    bin/zkServer.sh start

  5. Verify that Zookeeper is running.
    1. Start the command shell in bin/zkCli.sh.
    2. Enter:

      help

    You should see output similar to the following:

    [zkshell: 0] help 
    ZooKeeper host:port cmd args
        get path [watch]
        ls path [watch]
        set path data [version]
        delquota [-n|-b] path
        quit
        printwatches on|off
        createpath data acl
        stat path [watch]
        listquota path
        history
        setAcl path acl
        getAcl path
        sync path
        redo cmdno
        addauth scheme auth
        delete path [version]
        deleteall path
        setquota -n|-b val path
    
  6. Now that Zookeeper is running, try connecting to the command shell from another host. Enter:

    bin/zkCli.sh -server {host_name | IP}

    For example:

    bin/zkCli.sh -server dbhost6.test.com

  7. Integrate other applications on Zookeeper as required. For example, to integrate HBase configured with Zookeeper, add the following parameters:
    • In the hbase-env.sh file under $HBASE_HOME/conf/, add:
      export HBASE_MANAGES_ZK=false
      
    • In the hbase-site.xml file under $HBASE_HOME/conf/, add:
      <property>
         <name>hbase.zookeeper.quorum</name>
         <value>dbhost1,dbhost2</value>
      </property>
      

      Note that the settings to integrate other applications with Zookeeper vary depending on the application.