Configure Hadoop nodes
On HortonWorks HDP, you could configure Hadoop on Ambari GUI. If you are not familiar with HDFS/Hadoop, set up the native HDFS first by seeing the Hadoop cluster setup guide. Setting up the HDFS Transparency to replace the native HDFS is easier after you set up HDFS/Hadoop.
Hadoop and HDFS Transparency must take the same core-site.xml,
hdfs-site.xml, slaves (Hadoop 2.7.x) or workers (Hadoop 3.0.x+),
hadoop-env.sh and log4j.properties for both Hadoop nodes
and HDFS Transparency. This means, native HDFS in Hadoop and HDFS Transparency must take the same
NameNodes and DataNodes.
Note:
If your native HDFS NameNodes are different than HDFS Transparency NameNodes, you need to
update fs.defaultFS in your Hadoop configuration (for HortonWorks HDP it is
located under /etc/Hadoop/conf. If it is open source Apache Hadoop, it is
located under
$YOUR_HADOOP_PREFIX/etc/hadoop/.):- For HortonWorks HDP, the configuration files above are located under /etc/hadoop/conf. For open source Apache Hadoop, the configuration files are located under $YOUR_APACHE_HADOOP_HOME/etc/hadoop and /usr/lpp/mmfs/hadoop/etc/hadoop for HDFS Transparency 2.7.3-x.
- From HDFS Transparency 2.7.3-3, the configurations are located under /usr/lpp/mmfs/hadoop/etc/hadoop. From HDFS Transparency 3.0.0, the configurations are located under /var/mmfs/hadoop/etc/hadoop.
<property>
<name>fs.defaultFS</name>
<value>hdfs://hs22n44:8020</value>
</property>
For HDFS Transparency 2.7.0-x, 2.7.2-0, 2.7.2-1, do not export the Hadoop environment variables
on the HDFS Transparency nodes because this can lead to issues when the HDFS Transparency uses the
Hadoop environment variables to map to its own environment. The following Hadoop environment
variables can affect HDFS Transparency:
- HADOOP_HOME
- HADOOP_HDFS_HOME
- HADOOP_MAPRED_HOME
- HADOOP_COMMON_HOME
- HADOOP_COMMON_LIB_NATIVE_DIR
- HADOOP_CONF_DIR
- HADOOP_SECURITY_CONF_DIR
For HDFS Transparency versions 2.7.2-3+, 2.7.3-x and 3.0.x+, the environmental variables listed above can be exported except for HADOOP_COMMON_LIB_NATIVE_DIR. This is because HDFS Transparency uses its own native .so library.
For HDFS Transparency versions 2.7.2-3+ and 2.7.3-x:
- If you did not export HADOOP_CONF_DIR, HDFS Transparency will read all the configuration files under /usr/lpp/mmfs/hadoop/etc/hadoop such as the gpfs-site.xml file and the hadoop-env.sh file.
- If you export HADOOP_CONF_DIR, HDFS Transparency will read all the configuration files under $HADOOP_CONF_DIR. As gpfs-site.xml is required for HDFS Transparency, it will only read the gpfs-site.xml file from the /usr/lpp/mmfs/hadoop/etc/hadoop directory.
For questions or issues with HDFS Transparency configuration, send an email to scale@us.ibm.com.