Configure Hadoop nodes

On HortonWorks HDP, you could configure Hadoop on Ambari GUI. If you are not familiar with HDFS/Hadoop, set up the native HDFS first by seeing the Hadoop cluster setup guide. Setting up the HDFS Transparency to replace the native HDFS is easier after you set up HDFS/Hadoop.

Hadoop and HDFS Transparency must take the same core-site.xml, hdfs-site.xml, slaves (Hadoop 2.7.x) or workers (Hadoop 3.0.x+), hadoop-env.sh and log4j.properties for both Hadoop nodes and HDFS Transparency. This means, native HDFS in Hadoop and HDFS Transparency must take the same NameNodes and DataNodes.
Note:
  • For HortonWorks HDP, the configuration files above are located under /etc/hadoop/conf. For open source Apache Hadoop, the configuration files are located under $YOUR_APACHE_HADOOP_HOME/etc/hadoop and /usr/lpp/mmfs/hadoop/etc/hadoop for HDFS Transparency 2.7.3-x.
  • From HDFS Transparency 2.7.3-3, the configurations are located under /usr/lpp/mmfs/hadoop/etc/hadoop. From HDFS Transparency 3.0.0, the configurations are located under /var/mmfs/hadoop/etc/hadoop.
If your native HDFS NameNodes are different than HDFS Transparency NameNodes, you need to update fs.defaultFS in your Hadoop configuration (for HortonWorks HDP it is located under /etc/Hadoop/conf. If it is open source Apache Hadoop, it is located under $YOUR_HADOOP_PREFIX/etc/hadoop/.):
<property>
<name>fs.defaultFS</name>
<value>hdfs://hs22n44:8020</value>
</property>
For HDFS Transparency 2.7.0-x, 2.7.2-0, 2.7.2-1, do not export the Hadoop environment variables on the HDFS Transparency nodes because this can lead to issues when the HDFS Transparency uses the Hadoop environment variables to map to its own environment. The following Hadoop environment variables can affect HDFS Transparency:
  • HADOOP_HOME
  • HADOOP_HDFS_HOME
  • HADOOP_MAPRED_HOME
  • HADOOP_COMMON_HOME
  • HADOOP_COMMON_LIB_NATIVE_DIR
  • HADOOP_CONF_DIR
  • HADOOP_SECURITY_CONF_DIR

For HDFS Transparency versions 2.7.2-3+, 2.7.3-x and 3.0.x+, the environmental variables listed above can be exported except for HADOOP_COMMON_LIB_NATIVE_DIR. This is because HDFS Transparency uses its own native .so library.

For HDFS Transparency versions 2.7.2-3+ and 2.7.3-x:

  • If you did not export HADOOP_CONF_DIR, HDFS Transparency will read all the configuration files under /usr/lpp/mmfs/hadoop/etc/hadoop such as the gpfs-site.xml file and the hadoop-env.sh file.
  • If you export HADOOP_CONF_DIR, HDFS Transparency will read all the configuration files under $HADOOP_CONF_DIR. As gpfs-site.xml is required for HDFS Transparency, it will only read the gpfs-site.xml file from the /usr/lpp/mmfs/hadoop/etc/hadoop directory.

For questions or issues with HDFS Transparency configuration, send an email to scale@us.ibm.com.