Setting CES HDFS configuration files

This section describes the configuration files settings that will be changed in the Enable and configure CES HDFS section when using the mmhdfs command while you are manually trying to setup the CES HDFS cluster.

Before enabling HDFS Transparency, some configuration must be set. Some of them can be done automatically and some must be set manually.

Edit config fields

Use the following command on one CES transparency node to edit the config fields locally one at a time. After modifying the config fields, ensure that you upload to CCR (See Edit config files and upload section):
mmhdfs config set [config file] -k [key1=value] -k [key2=value] ... -k [keyX-value]

Edit config files and upload

Use the following command on one CES transparency node to download the configuration files, edit them and then upload the changes into CCR:
mmhdfs config import/export [a local config dir] [config_file1,config_file2,...]

mmhdfs config upload

Configuration file settings

The following configurations should be set to proper value to support CES IP failover:

For hadoop_env.sh:

JAVA_HOME: Set the correct java home path for the node.

For hdfs-site.xml:
  • dfs.nameservices: Set to the logical name of the cluster. This must be equal to the CES group name without the hdfs prefix.
    In the following example, we use hdfscluster as the CES group name where hdfs is the prefix, and cluster is the cluster name:
    <property>
      <name>dfs.nameservices</name>
      <value>cluster</value>
    </property>
    
  • dfs.ha.namenodes.[nameservice ID]: Set to a list of comma-separated NameNode IDs.
    For example:
    <property>
      <name>dfs.ha.namenodes.cluster</name>
      <value>nn1,nn2</value>
    </property>
    

    If there is only one NameNode (Only one CES node which means no CES HA) the list should contain only one ID.

    For example:
    <property>
      <name>dfs.ha.namenodes.cluster</name>
      <value>nn1</value>
    </property>
    
  • dfs.namenode.rpc-address.[nameservice ID].[namenode ID]: Set to the fully qualified RPC address for each NameNode to listen on.
    For example:
    <property>
      <name>dfs.namenode.rpc-address.cluster.nn1</name>
      <value>machine1.example.com:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.cluster.nn2</name>
      <value>machine2.example.com:8020</value>
    </property>
    
  • dfs.namenode.http-address.[nameservice ID].[namenode ID]: Set to the fully qualified HTTP address for each NameNode to listen on.
    For example:
    <property>
      <name>dfs.namenode.http-address.hdfscluster.nn1</name>
      <value>machine1.example.com:50070</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.hdfscluster.nn2</name>
      <value>machine2.example.com:50070</value>
    </property>
    
  • dfs.namenode.shared.edits.dir: Set to a directory which will be used to store shared editlogs for this HDFS HA cluster. The recommendation is to use a name like HA-[dfs.nameservices].
    For example:
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>file:///gpfs/HA-cluster</value>
    </property>
    
    Note: If there is only one NameNode (Only one CES node which means no CES HA), do not set this property. Otherwise, NameNode will fail to start. The NameNode shared edit dir is used for HA.
  • dfs.client.failover.proxy.provider. [nameservice ID]
    <property>
       <name>dfs.client.failover.proxy.provider.cluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    
  • dfs.namenode.rpc-bind-host: This should be set to 0.0.0.0.

    For example:

    <property>
       <name>dfs.namenode.rpc-bind-host</name>
       <value>0.0.0.0</value>
    </property>
    
  • dfs.namenode.servicerpc-bind-host: This should be set to 0.0.0.0.
    For example:
    <property>
       <name>dfs.namenode.servicerpc-bind-host</name>
       <value>0.0.0.0</value>
    </property>
    
  • dfs.namenode.lifeline.rpc-bind-host: This should be set to 0.0.0.0.
    For example:
    <property>
       <name>dfs.namenode.lifeline.rpc-bind-host</name>
       <value>0.0.0.0</value>
    </property>
    
  • dfs.namenode.http-bind-host: This should be set to 0.0.0.0.
    For example:
    <property>
       <name>dfs.namenode.http-bind-host</name>
       <value>0.0.0.0</value>
    </property>
    

For core-site.xml:

fs.defaultFS: This should be set to the value of the dfs.nameservices. For CES HDFS, this must be the CES HDFS group name without the hdfs prefix.

For example:
</property>
  <name>fs.defaultFS</name>
  <value>hdfs://cluster</value>
</property>

Follow the Enable and configure CES HDFS section to set the configuration values for non-HA and HA CES HDFS Transparency cluster.