Single viewfs namespace between IBM Storage Scale and native HDFS –Part I

This topic describes the steps to get a single namespace by federating HDFS transparency namespace into native HDFS namespace. If you take HortonWorks HDP, you could change the configurations from the Ambari GUI > HDFS > Configs.

In this mode, nn1_host is HDFS Transparency NameNode. You have another native HDFS cluster. After you federate native HDFS into HDFS Transparency, your applications can access the data from both HDFS Transparency and native HDFS with the schema fs.defaultFS defined in the HDFS Transparency cluster. All configuration changes are done on the HDFS Transparency side and your Hadoop client nodes. This mode does not need configurations change on native HDFS cluster.

  1. Shut down the HDFS Transparency cluster daemon by running the following command from one of the HDFS transparency nodes in the cluster:
    # mmhadoopctl connector stop
  2. On the nn1_host, add the following configuration settings in /usr/lpp/mmfs/hadoop/etc/hadoop/core-site.xml (for HDFS Transparency 2.7.3-x) or /var/mmfs/hadoop/etc/hadoop/core-site.xml (for HDFS Transparency 3.0.x).:
    <configuration>
    <property>
    	<name>fs.defaultFS</name>
    	<value>viewfs://<viewfs_clustername></value>
    	<description>The name of the namespace</description>
    </property>
    
    <property>
    	<name>fs.viewfs.mounttable.<viewfs_clustername>.link./<viewfs_dir1></name>
    	<value>hdfs://nn1_host:8020/<mount_dir></value>
    	<description>The name of the Spectrum Scale file system</description>
    </property>
    
    <property>
    	<name>fs.viewfs.mounttable.<federation_clustername>.link./<viewfs_dir2></name>
    	<value>hdfs://nn2_host:8020/<mount_dir></value>
    	<description>The name of the hdfs file system</description>
    </property>
    </configuration>
    Note: Change <viewfs_clustername> and <mount_dir> according to your cluster configuration. In this example, the nn1_host refers to the HDFS transparency NameNode and the nn2_host refers to the native HDFS NameNode.

    Once the federation configuration changes are in effect on the node, the node will only see the directories that are specified in the core-site.xml file. For the above configurations, you can only see the two directories /<viewfs_dir1> and /<viewfs_dir2>.

  3. On nn1_host, add the following configuration settings in /var/mmfs/hadoop/etc/hadoop/hdfs-site.xml (for HDFS Transparency 3.0.x).
    <configuration>
    <property>
    	<name>dfs.nameservices</name>
    	<value>nn1,nn2</value>
    </property>
    
    <property>
    	<name>dfs.namenode.rpc-address.nn1</name>
    	<value>nn1-host:8020</value>
    </property>
    
    <property>
    	<name>dfs.namenode.rpc-address.nn2</name>
    	<value>nn2-host:8020</value>
    </property>
    
    <property>
    	<name> dfs.namenode.http-address.nn1</name>
    	<value>nn1-host:50070</value>
    </property>
    
    <property>
    	<name>dfs.namenode.http-address.nn2</name>
    	<value>nn2-host:50070</value>
    </property>
    </configuration>
    
  4. On nn1_host, synchronize the configuration changes with the other HDFS transparency nodes by running the following command:
    For HDFS Transparency 2.7.3-x:
    # mmhadoopctl connector syncconf /usr/lpp/mmfs/hadoop/etc/hadoop/
    For HDFS Transparency 3.0.x:
    # mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop/
    Note: The following output messages from the above command for the native HDFS NameNode, nn2-host, can be seen:
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    scp: /usr/lpp/mmfs/hadoop/etc/hadoop//: No such file or directory
    

    The output messages above are seen because during the synchronization of the configuration to all the nodes in the cluster, the /usr/lpp/mmfs/Hadoop/etc/hadoop directory does not exist in the nn2-host native HDFS NameNode. This is because the HDFS Transparency is not installed on the native HDFS NameNode. Therefore, these messages for the native HDFS NameNode can be ignored.

    Another way to synchronize the configuration files is by using the scp command to copy the following files under /usr/lpp/mmfs/hadoop/etc/hadoop/ into all the other nodes in HDFS Transparency cluster: workers, log4j.properties, hdfs-site.xml, hadoop-policy.xml, hadoop-metrics.properties, hadoop-metrics2.properties, core-site.xml, and gpfs-site.xml.

    For HDFS Transparency 3.0.x:
    #mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop
  5. On nn1_host, start all the HDFS transparency cluster nodes by running the following command:

    # mmhadoopctl connector start

    Note: The following warning output messages from the above command for the native HDFS NameNode, nn2-host can be seen:
    nn2-host: bash: line 0: cd: /usr/lpp/mmfs/hadoop: No such file or directory
    nn2-host: bash: /usr/lpp/mmfs/hadoop/sbin/hadoop-daemon.sh: No such file or directory

    These messages are displayed because HDFS Transparency is not installed on the native HDFS NameNode. Therefore, these messages can be ignored.

    To avoid the above messages, run the following commands:
    1. On nn1-host, run the following command as root to start the HDFS Transparency NameNode:
      # cd /usr/lpp/mmfs/hadoop; /usr/lpp/mmfs/hadoop/sbin/hadoop-daemon.sh
      --config /usr/lpp/mmfs/hadoop/etc/hadoop
      --script /usr/lpp/mmfs/hadoop/sbin/gpfs start namenode
    2. On nn1-host, run the following command as root to start the HDFS Transparency DataNode:
      # cd /usr/lpp/mmfs/hadoop; /usr/lpp/mmfs/hadoop/sbin/hadoop-daemons.sh
      --config /usr/lpp/mmfs/hadoop/etc/hadoop
      --script /usr/lpp/mmfs/hadoop/sbin/gpfs start datanode
    Note: If you deployed IBM® BigInsights® IOP, the IBM Storage Scale Ambari integration module (gpfs.hdfs-transparency.ambari-iop_4.1-0) does not support viewfs configuration in Ambari. Therefore, starting the HDFS Transparency service or other services will regenerate the core-site.xml and hdfs-site.xml from the Ambari database and will overwrite the changes that were done from Step 1 to Step 4. HDFS Transparency and all other services will have to be started in the command mode.
  6. Update the configuration changes in Step 2 and Step 3 in your Hadoop client configurations so that the Hadoop applications can view all the directories in viewfs.
    Note: If you deployed IBM BigInsights IOP, update the core-site.xml and the hdfs-site.xml in Step 2 and Step 3 accordingly from the /etc/hadoop/conf directory on each of the node so that the Hadoop applications can see the directories in viewfs.

    If you deployed Open Source Apache Hadoop, then update the core-site.xml and the hdfs-site.xml according to the Apache Hadoop location configured in your site.

  7. From one of the Hadoop clients, verify that the viewfs directories are available by running the following command:
    hadoop dfs -ls /