mmhadoopctl supports dual network

The HDFS Transparency mmhadoopctl command now supports dual network configuration.

The mmhadoopctl for dual network setup is not used in Ambari. Therefore, if you are using Ambari, see Dual-network deployment section for setup.

HDFS Transparency mmhadoopctl command requires the NameNode and DataNode to have password-less ssh access setup for the network.

The HDFS Transparency dual network setup is for the case when HDFS Transparency node names are on a private network and cannot configure password-less ssh access.

For HDFS Transparency mmhadoopctl to work properly using network without password-less ssh access configured, the following export variable, NODE_HDFS_MAP_GPFS, will need to be set in order to convert the HDFS Transparency node names set in the HDFS Transparency config files to use the IBM Storage Scale admin node name that has password-less ssh setup.

Scenario:

IBM Storage Scale admin network is configured on network 1.

HDFS Transparency NameNode and DataNode and all the Hadoop nodes are configured to use network 2.
Note:
  • IBM Storage Scale requires only the admin network to have password-less ssh access.
  • It is required to use the export command to export the NODE_HDFS_MAP_GPFS variable in the hadoop-env.sh file to generate the mapping file correctly.
  • Delete the /var/mmfs/hadoop/init/nodemap mapping file on all nodes if needed to regenerate this file when HDFS Transparency restarts.
  • Ensure that you delete the nodemap file on all the nodes before doing a syncconf.
  • In order to run the mmhadoopctl connector start/stop command on the node in a dual network environment, the export NODE_HDFS_MAP_GPFS=yes is required to be set so that the nodemap file is created for the node.
Steps:
  1. Edit configuration.
    Manually add the export line 'export NODE_HDFS_MAP_GPFS=yes' in the /var/mmfs/hadoop/etc/hadoop/hadoop-env.sh file.
    # cat hadoop-env.sh | tail -2 
    export NODE_HDFS_MAP_GPFS=yes

    This will generate a request for HDFS Transparency to convert the node names used in HDFS Transparency config files to the IBM Storage Scale admin node names.

    A mapping file /var/mmfs/hadoop/init/nodemap will be created.

    If the Hadoop configuration hosts is changed (add/delete), then the mapping file /var/mmfs/hadoop/init/nodemap will need to be deleted so that restarting the HDFS Transparency can re-create a new mapping file with the correct host configuration entries.

  2. Sync the configuration.
    • Ensure to remove all existing /var/mmfs/hadoop/init/nodemap files from all the nodes.
    • Run mmhadoopctl syncconf to sync the configuration files in the cluster. For syncconf syntax, see Sync HDFS Transparency configurations.
  3. Start Transparency.

    The mmhadoopctl will now be set to use the Scale admin node names.