Configuring users, groups and file system access for IBM Storage Scale

This section shows how to create CDP Private Cloud Base users and groups on the HDFS Transparency nodes, and also how to configure the IBM Storage Scale file system access.

When you register hosts to Cloudera Manager, Cloudera Manager creates Hadoop users and groups corresponding to the services on all the managed hosts. These users and groups must be manually created on the IBM Storage Scale HDFS Transparency hosts before registering these hosts to the Cloudera Manager. Because IBM Storage Scale is a POSIX file system, it is required that any common system user and group must have the same UID and GID across all the IBM Storage Scale nodes.

  1. If you are using Windows AD or LDAP-based network, users and groups for Hadoop users on your HDFS Transparency nodes should have consistent UID and GID across all the HDFS Transparency nodes. In that case, skip the next step and go to step 3.
  2. If you are using local users, run the following command on one of the HDFS Transparency nodes to create these users and groups. The command can dynamically fetch the list of the NameNode and DataNode hosts from the cluster configuration and add the Hadoop users and groups to those hosts.

    A password-less SSH channel should exist for root from that host to all the other HDFS Transparency nodes for the command to run.

    /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups
    This command performs the following actions:
    • Creates the CDP Private Cloud Base users and groups as defined in the user_grp_dir_metadata.json file.
    • Ensures that all such users and groups have consistent UID and GID across the hosts.
    • Creates a system group called supergroup to be used as Hadoop supergroup. hdfs, mapred and yarn users are added as members of this supergroup.
    • The output of the command is logged to /var/log/user_group_configuration.log file.

    Note: If you are using HDFS Transparency 3.1.1-5 or earlier, the command should be used with the --hadoop-hosts option as follows:

    /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts <comma separated list of HDFS Transparency NameNodes and DataNodes>
    For example:
    /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts nn01.gpfs.net,nn02.gpfs.net, dn01.gpfs.net,dn02.gpfs.net
  3. (Optional): If you want to add only one NameNode or DataNode to an existing HDFS Transparency cluster, you need to input all the existing hostnames for the HDFS Transparency cluster and the new hostnames to the gpfs_create_hadoop_users_dirs.py script. This script ensures that all the values of UID/GID for the users and groups in the IBM Storage Scale cluster are consistent.

    For example:

    Existing HDFS Transparency hosts:
    nn01.gpfs.net,nn02.gpfs.net, dn01.gpfs.net,dn02.gpfs.net
    New DataNode being added:
    dn03.gpfs.net
    Run the following command to create the Hadoop users/groups on dn03.gpfs.net:
    /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts nn01.gpfs.net,nn02.gpfs.net,dn01.gpfs.net,dn02.gpfs.net,dn03.gpfs.net
  4. Verify the users and groups by running the following command:
    /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --verify-users-and-groups
  5. (Optional): You can also create any custom user/group on the IBM Storage Scale nodes using the --create-custom-hadoop-user-group user-name[:group1[,group2..]] command. The command ensures that such a user/group is created with consistent UID/GID across all the nodes.
    • In the following example we create a user called testuser across the HDFS Transparency nodes. The user is created as a part of the hadoop group:
      # /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-custom-hadoop-user-group testuser:hadoop
      Checking current state of the system..
      Group: hadoop already present on host nn01.gpfs.net
      Group: hadoop already present on host dn01.gpfs.net
      Group: testuser(10030) added successfully on host nn01.gpfs.net
      Group: testuser(10030) added successfully on host dn01.gpfs.net
      User: testuser(10028) added successfully on host nn01.gpfs.net
      User: testuser(10028) added successfully on host dn01.gpfs.net
    • On every CDP Private Cloud Base nodes which is not an IBM Storage Scale node, run the following command:
      # /usr/bin/useradd testuser
  6. Configure Hadoop supergroup for HDFS Transparency.
    Ensure that HDFS Transparency is stopped by running the following command:
    /usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs status
    If HDFS Transparency is still running, stop it by using the following command:
    /usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs stop
    Set the dfs.permissions.superusergroup parameter to supergroup by running the following command:
    /usr/lpp/mmfs/hadoop/sbin/mmhdfs config set hdfs-site.xml -k dfs.permissions.superusergroup=supergroup
    Upload the configuration by running the following command:
    /usr/lpp/mmfs/hadoop/sbin/mmhdfs config upload
  7. Set ownership for the IBM Storage Scale Hadoop root directory.
    It is recommended to set the ownership for the IBM Storage Scale Hadoop root directory as hdfs:supergroup with 755 (rwxr-xr-x) permissions. By default, it is set to root:root.
    # /usr/bin/chown hdfs:supergroup <IBM Storage Scale mount directory>/< IBM Storage Scale Hadoop data directory>
    For example:
    # /usr/bin/chown hdfs:supergroup /ibm/gpfs/datadir1
    You can retrieve the IBM Storage Scale mount directory (gpfs.mnt.dir) and the IBM Storage Scale Hadoop data directory (gpfs.data.dir) using the following commands on a CES HDFS cluster node:
    /usr/lpp/mmfs/hadoop/sbin/mmhdfs config get gpfs-site.xml -k gpfs.mnt.dir -k gpfs.data.dir
    where, gpfs.data.dir=datadir1 and gpfs.mnt.dir=/ibm/gpfs.