Apache Zookeeper is an open-source project
providing a centralized configuration service and naming registry
for large distributed systems. In essence, Zookeeper is a service
layer on your cluster that serves as a single point of management
for distributed applications, enabling you to streamline application
management. Note, however, that Zookeeper is meant for use by application
developers, rather than by administrators.
About this task
Ensure that the MapReduce framework in IBM® Spectrum Symphony is set to use Zookeeper. For the supported versions of Hadoop see Supported distributed files systems for MapReduce or YARN integration. For the supported versions of Zookeeper that the MapReduce framework in IBM Spectrum Symphony has been qualified with, see Supported third-party applications for MapReduce.
Procedure
-
Download and install the latest stable version of Zookeeper.
For information on installing Zookeeper, refer to the Zookeeper documentation.
- Once you have extracted the Zookeeper installation file
on all the hosts that you want to install the application on, create
the configuration file required to start Zookeeper in the /conf directory; for example, /zookeeper-3.4.6/conf/zoo.cfg.
- Edit the zoo.cfg file to provide values
for the following properties:
- tickTime—Specifies the duration (in
milliseconds) at which Zookeeper checks the status of the hosts. For
example:
tickTime=2000
- dataDir—Specifies the directory to
store the in-memory database. If this directory does not exist, create
it and ensure that the user has read-write permissions. For example:
dataDir=/admin/zookeeper-3.4.6/data
- clientPort—Specifies the port that
the Zookeeper client listens on for connections. For example:
clientPort=2181
- server.n—(Optional)
Specifies host names and ports for management servers in order of
failover if you have replicated servers, where:
n identifies the main management server, followed by other servers
in the order of priority for failover. For example:
server.1=dbhost1:2888:3888
server.2=dbhost2:2888:3888
- Run Zookeeper from its home directory:
- Verify that Zookeeper is running.
- Start the command shell in bin/zkCli.sh.
- Enter:
You should see output similar to the following:
[zkshell: 0] help
ZooKeeper host:port cmd args
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
createpath data acl
stat path [watch]
listquota path
history
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
deleteall path
setquota -n|-b val path
- Now that Zookeeper is running, try connecting to the command
shell from another host. Enter:
bin/zkCli.sh
-server {host_name | IP}
For example:
bin/zkCli.sh -server dbhost6.test.com
- Integrate other applications on Zookeeper as required.
For example, to integrate HBase configured with Zookeeper, add the
following parameters:
- In the hbase-env.sh file under $HBASE_HOME/conf/, add:
export HBASE_MANAGES_ZK=false
- In the hbase-site.xml file under $HBASE_HOME/conf/, add:
<property>
<name>hbase.zookeeper.quorum</name>
<value>dbhost1,dbhost2</value>
</property>
Note that the settings to integrate other applications
with Zookeeper vary depending on the application.