Passwordless ssh access

Ensure that the root password-less ssh access does not prompt a response for the user. If the root password-less access configuration cannot be setup, HDFS transparency fails to start. The mmhadoopctl and mmhdfs commands require password-less ssh to all the nodes including itself.

If the IBM StorageĀ® Scale cluster is configured as adminMode=central, HDFS Transparency NameNodes can be configured on the management nodes of the IBM Storage Scale cluster. To check if the IBM Storage Scale cluster is configured as adminMode=central, run mmlsconfig adminMode.

If the IBM Storage Scale cluster is configured in sudo wrapper mode, IBM Storage Scale requires the user to have password-less root access to all the other nodes as a common user. To check if the IBM Storage Scale cluster is configured in sudo wrapper mode, log in as a root user in the node and execute ssh <non-root>@<other-node> in the password-less mode. With IBM Storage Scale in sudo wrapper mode, HDFS Transparency still requires the node to have root access to all the other nodes including itself to run the mmhadoopctl and mmhdfs commands.

HDFS Transparency provides the following options for root password-less requirement:
  1. Local cluster options
    For the local cluster, follow one of the following options for the root password-less requirement:
    1. By default, HDFS Transparency requires root password-less access between any two nodes in the HDFS Transparency cluster.
    2. If the above option is not feasible, you need at least one node with root password-less access to all the other HDFS Transparency nodes and to itself. In such a case, mmhadoopctl/mmhdfs command can be run only on this node and this node should be configured as HDFS Transparency NameNodes. If NameNode HA is configured, all NameNodes should be configured with root password-less access to all DataNodes.
      Note:
      • If you configure the IBM Storage Scale cluster in admin central mode (mmchconfig adminMode=central), you can configure HDFS Transparency NameNodes on the IBM Storage Scale management nodes. Therefore, you have root password-less access from these management nodes to all the other nodes in the cluster.
      • If the file system is remotely mounted, HDFS Transparency requires two password-less access configurations: one is for the local cluster (configure HDFS Transparency according to this option for password-less access in the local cluster) and the other is for remote file system.
  2. Remote cluster options
    For the remote file system, follow one of the following options for the root password-less requirement:
    1. By default, HDFS Transparency NameNodes require root password-less access to at least one of the contact nodes (the 1st contact node is recommended if you cannot configure all contact nodes as password-less access) from the remote cluster.
      For example, in the following cluster, ess01-dat.gpfs.net and ess02-dat.gpfs.net are contact nodes. ess01-dat.gpfs.net is the first contact node because it is listed first in the property Contact nodes:
      # /usr/lpp/mmfs/bin/mmremotecluster show all
      Cluster name: test01.gpfs.net
      Contact nodes: ess01-dat.gpfs.net,ess02-dat.gpfs.net
      SHA digest: abe321118158d045f5087c00f3c4b0724ed4cfb8176a05c348ae7d5d19b9150d
      File systems: latestgpfs (gpfs0) 
      Note: HDFS Transparency DataNodes do not require root password-less access to the contact nodes.
    2. From HDFS Transparency 2.7.3-3, HDFS Transparency supports non-root password-less access to one of the contact nodes as a common user (instead of root user).

      First, on HDFS Transparency NameNodes, configure password-less access for the root user as a non-privileged user to the contact nodes (at least one contact node and recommend the first contact node) from the remote cluster. Here, the gpfsadm user is used as an example.

      Add the following into the /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 2.7.3-x) or /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 3.1.x) file on HDFS Transparency NameNodes.
      
        <property>
          <name>gpfs.ssh.user</name>
          <value>gpfsadm</value>
        </property>
      
      On one of the contact nodes (the first contact node is recommended), edit /etc/sudoers using visudo and add the following to the sudoers file.
      gpfsadm  ALL=(ALL)       NOPASSWD: /usr/lpp/mmfs/bin/mmlsfs, /usr/lpp/mmfs/bin/mmlscluster,
      /usr/lpp/mmfs/bin/mmlsnsd, /usr/lpp/mmfs/bin/mmlsfileset, /usr/lpp/mmfs/bin/mmlssnapshot,
      /usr/lpp/mmfs/bin/mmcrsnapshot, /usr/lpp/mmfs/bin/mmdelsnapshot, /usr/lpp/mmfs/bin/tslsdisk
      The gpfsadm user can run these IBM Storage Scale commands for any filesets in the file system using the sudo configurations above.
      Note: Comment out Defaults requiretty. Otherwise, sudo: sorry, you must have a tty to run sudo error will occur.
      #
      # Disable "ssh hostname sudo <cmd>", because it will show the password in clear.
      #         You have to run "ssh -t hostname sudo <cmd>".
      #
      #Defaults    requiretty
      
      Note: Before you start HDFS Transparency, log in HDFS Transparency NameNodes as root and run ssh gpfsadmin@<the configured contact node> /usr/lpp/mmfs/bin/mmlsfs <fs-name> to confirm that it works.
    3. Manually generate the internal configuration files from the contact node and copy them onto the local nodes so that you do not require root or user password-less ssh to the contact nodes.

      From HDFS transparency 2.7.3-2, you can configure gpfs.remotecluster.autorefresh as false in /usr/lpp/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 2.7.3-x) or /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml (for HDFS Transparency 3.1.x).

      Manually copy the /usr/lpp/mmfs/hadoop/sbin/initmap.sh script from the NameNode to one of the contact nodes. The script can be copied to any directory.

      Create the /var/mmfs/hadoop/etc/hadoop directory on the contact node and copy the contents of the /var/mmfs/hadoop/etc/hadoop directory from the NameNode to the directory created on the contact node.

      Log on the contact node as root and run the initmap.sh command.

      For example, to get the initmap files for two file systems on the contact node, run the following command:
      /<savedir>/initmap.sh -i all <fs1>,<fs2>
      Note: Do not use the -d option when running on the contact node.

      Copy the generated internal configuration files to all the HDFS Transparency nodes.

      The initmap.sh script requires to be re-run on the remote system if any of the following are changed:
      • There are updates to the dataReplica configuration values for the filesystem.
      • The gpfs cluster name (from the mmlscluster output) is changed through the mmchcluster command on the remote system.
      • There are updates to the filesystem name in either the remote or local clusters.
      • There are updates to the contact nodes information from the local cluster to the remote cluster.

      For the initmap.sh script command syntax and generated internal configuration files, see Cluster and file system information configuration.

      Note: If gpfs.remotecluster.autorefresh is configured as false, the snapshot from Hadoop interface is not supported against the remote mounted file system.

      If the IBM Storage Scale cluster is configured as adminMode=central (check by executing mmlsconfig adminMode), HDFS Transparency NameNodes can be configured on the management nodes of the IBM Storage Scale cluster.