HDFS clients configuration

HDFS clients must be configured in the following way to work with the CES IP failover mechanism.

The cluster name is the CES group name without the hdfs prefix.

The value of fs.defaultFS and dfs.nameservices should be configured as the cluster name (In this example, cluster). The cluster name for the HDFS client should be the same as the NameNodes or DataNodes.

For CES HDFS, there is only one NameNode in the HDFS client configuration. The hostname of the CES IP configured for CES group should be used as the NameNode value (In this example, cesip.example.com). This is same for HA and non-HA configuration.

For example, the Apache Hadoop is installed in /usr/hadoop-3.1.3, so the Hadoop configuration files are all located at /usr/hadoop-3.1.3/etc/hadoop.

For core-site.xml:

The values should be the same as the HDFS transparency configuration file on the NameNode.

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://cluster</value>
  </property>

For hdfs-site.xml:

Replace the HDFS transparency NameNode with the host name of the corresponding CES IP value.

  <property>
    <name>dfs.nameservices</name>
    <value>cluster</value>
  </property>
  <property>
    <name>dfs.ha.namenodes.cluster</name>
    <value>nn1</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.cluster.nn1</name>
    <value>cesip.example.com:8020</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.cluster.nn1</name>
    <value>cesip.example.com:50070</value>
  </property>
  <property>
    <name>dfs.client.failover.proxy.provider.cluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <property>
     <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>
Note: The NameNode configuration will contain properties for both the NameNodes while the HDFS clients will only define one NameNode property that contains the CES IP hostname. The HDFS clients in the CES HDFS environment will only know about one NameNode and will communicate with CES HDFS Transparency through this IP. High availability is achieved through failing over the IP to another NameNode. This is handled by CES and transparent for HDFS clients because they always talk to the same IP.

For hadoop_env.sh:

For Apache Hadoop, configure the properties with the values based on your host environment.

Then set the JAVA_HOME value in haoop-env.sh file.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk