Setting CES HDFS configuration files
This section describes the configuration files settings that will be changed in the Enable and configure CES HDFS section when using the mmhdfs command while you are manually trying to setup the CES HDFS cluster.
Before enabling HDFS Transparency, some configuration must be set. Some of them can be done automatically and some must be set manually.
Edit config fields
mmhdfs config set [config file] -k [key1=value] -k [key2=value] ... -k [keyX-value]Edit config files and upload
mmhdfs config import/export [a local config dir] [config_file1,config_file2,...]
mmhdfs config upload
Configuration file settings
The following configurations should be set to proper value to support CES IP failover:
For hadoop_env.sh:
JAVA_HOME: Set the correct java home path for the node.
- dfs.nameservices: Set to the logical name of the cluster. This must be
equal to the CES group name without the hdfs prefix.In the following example, we use hdfscluster as the CES group name where hdfs is the prefix, and cluster is the cluster name:
<property> <name>dfs.nameservices</name> <value>cluster</value> </property> - dfs.ha.namenodes.[nameservice ID]: Set to a list of comma-separated
NameNode IDs. For example:
<property> <name>dfs.ha.namenodes.cluster</name> <value>nn1,nn2</value> </property>If there is only one NameNode (Only one CES node which means no CES HA) the list should contain only one ID.
For example:<property> <name>dfs.ha.namenodes.cluster</name> <value>nn1</value> </property> - dfs.namenode.rpc-address.[nameservice ID].[namenode ID]: Set to the fully
qualified RPC address for each NameNode to listen on. For example:
<property> <name>dfs.namenode.rpc-address.cluster.nn1</name> <value>machine1.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.cluster.nn2</name> <value>machine2.example.com:8020</value> </property> - dfs.namenode.http-address.[nameservice ID].[namenode ID]: Set to the fully
qualified HTTP address for each NameNode to listen on. For example:
<property> <name>dfs.namenode.http-address.hdfscluster.nn1</name> <value>machine1.example.com:50070</value> </property> <property> <name>dfs.namenode.http-address.hdfscluster.nn2</name> <value>machine2.example.com:50070</value> </property> - dfs.namenode.shared.edits.dir: Set to a directory which will be used to
store shared editlogs for this HDFS HA cluster. The recommendation is to use a name like
HA-[dfs.nameservices].For example:
<property> <name>dfs.namenode.shared.edits.dir</name> <value>file:///gpfs/HA-cluster</value> </property>Note: If there is only one NameNode (Only one CES node which means no CES HA), do not set this property. Otherwise, NameNode will fail to start. The NameNode shared edit dir is used for HA. - dfs.client.failover.proxy.provider. [nameservice
ID]
<property> <name>dfs.client.failover.proxy.provider.cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> - dfs.namenode.rpc-bind-host: This should be set to 0.0.0.0.
For example:
<property> <name>dfs.namenode.rpc-bind-host</name> <value>0.0.0.0</value> </property> - dfs.namenode.servicerpc-bind-host: This should be set to 0.0.0.0. For example:
<property> <name>dfs.namenode.servicerpc-bind-host</name> <value>0.0.0.0</value> </property> - dfs.namenode.lifeline.rpc-bind-host: This should be set to 0.0.0.0.For example:
<property> <name>dfs.namenode.lifeline.rpc-bind-host</name> <value>0.0.0.0</value> </property> - dfs.namenode.http-bind-host: This should be set to 0.0.0.0. For example:
<property> <name>dfs.namenode.http-bind-host</name> <value>0.0.0.0</value> </property>
For core-site.xml:
fs.defaultFS: This should be set to the value of the dfs.nameservices. For CES HDFS, this must be the CES HDFS group name without the hdfs prefix.
</property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
</property>
Follow the Enable and configure CES HDFS section to set the configuration values for non-HA and HA CES HDFS Transparency cluster.