Configuring users, groups and file system access for IBM Storage Scale
This section shows how to create CDP Private Cloud Base users and groups on the HDFS Transparency nodes, and also how to configure the IBM Storage Scale file system access.
- Check if you have a Windows AD or LDAP-based network
- Create CDP Private Cloud Base users and groups
- (Optional): Create CDP Private Cloud Base users and groups on a new node being added (only for Add Node)
- Verify the users and groups
- Create any custom user or group (Optional)
- Configure Hadoop supergroup for HDFS Transparency
- Set ownership for the IBM Storage Scale file system Hadoop root directory
When you register hosts to Cloudera Manager, Cloudera Manager creates Hadoop users and groups corresponding to the services on all the managed hosts. These users and groups must be manually created on the IBM Storage Scale HDFS Transparency hosts before registering these hosts to the Cloudera Manager. Because IBM Storage Scale is a POSIX file system, it is required that any common system user and group must have the same UID and GID across all the IBM Storage Scale nodes.
- If you are using Windows AD or LDAP-based network, users and groups for Hadoop users on your HDFS Transparency nodes should have consistent UID and GID across all the HDFS Transparency nodes. In that case, skip the next step and go to step 3.
- If you are using local users, run the following command on one of the HDFS
Transparency nodes to create these users and groups. The command can dynamically fetch the list of
the NameNode and DataNode hosts from the cluster configuration and add the Hadoop users and groups
to those hosts.
A password-less SSH channel should exist for root from that host to all the other HDFS Transparency nodes for the command to run.
/usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups
This command performs the following actions:- Creates the CDP Private Cloud Base users and groups as defined in the user_grp_dir_metadata.json file.
- Ensures that all such users and groups have consistent UID and GID across the hosts.
- Creates a system group called supergroup to be used as Hadoop supergroup. hdfs, mapred and yarn users are added as members of this supergroup.
- The output of the command is logged to /var/log/user_group_configuration.log file.
Note: If you are using HDFS Transparency 3.1.1-5 or earlier, the command should be used with the --hadoop-hosts option as follows:
/usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts <comma separated list of HDFS Transparency NameNodes and DataNodes>
For example:/usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts nn01.gpfs.net,nn02.gpfs.net, dn01.gpfs.net,dn02.gpfs.net
- (Optional):
If you want to add only one NameNode or DataNode to an existing HDFS Transparency cluster, you need
to input all the existing hostnames for the HDFS Transparency cluster and the new hostnames to the
gpfs_create_hadoop_users_dirs.py script. This script ensures that all the
values of UID/GID for the users and groups in the IBM
Storage Scale cluster are consistent.
For example:
Existing HDFS Transparency hosts:nn01.gpfs.net,nn02.gpfs.net, dn01.gpfs.net,dn02.gpfs.net
New DataNode being added:dn03.gpfs.net
Run the following command to create the Hadoop users/groups ondn03.gpfs.net
:/usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-users-and-groups --hadoop-hosts nn01.gpfs.net,nn02.gpfs.net,dn01.gpfs.net,dn02.gpfs.net,dn03.gpfs.net
- Verify the users and groups by running the following
command:
/usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --verify-users-and-groups
- (Optional): You can also create any custom user/group on the IBM Storage Scale nodes using the
--create-custom-hadoop-user-group user-name[:group1[,group2..]] command. The
command ensures that such a user/group is created with consistent UID/GID across all the nodes.
- In the following example we create a user called testuser across the HDFS
Transparency nodes. The user is created as a part of the hadoop
group:
# /usr/lpp/mmfs/hadoop/scripts/gpfs_create_hadoop_users_dirs.py --create-custom-hadoop-user-group testuser:hadoop Checking current state of the system.. Group: hadoop already present on host nn01.gpfs.net Group: hadoop already present on host dn01.gpfs.net Group: testuser(10030) added successfully on host nn01.gpfs.net Group: testuser(10030) added successfully on host dn01.gpfs.net User: testuser(10028) added successfully on host nn01.gpfs.net User: testuser(10028) added successfully on host dn01.gpfs.net
- On every CDP Private Cloud Base nodes which is not an IBM Storage Scale node, run the following command:
# /usr/bin/useradd testuser
- In the following example we create a user called testuser across the HDFS
Transparency nodes. The user is created as a part of the hadoop
group:
- Configure Hadoop supergroup for HDFS Transparency.Ensure that HDFS Transparency is stopped by running the following command:
/usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs status
If HDFS Transparency is still running, stop it by using the following command:/usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs stop
Set the dfs.permissions.superusergroup parameter to supergroup by running the following command:/usr/lpp/mmfs/hadoop/sbin/mmhdfs config set hdfs-site.xml -k dfs.permissions.superusergroup=supergroup
Upload the configuration by running the following command:/usr/lpp/mmfs/hadoop/sbin/mmhdfs config upload
- Set ownership for the IBM Storage
Scale Hadoop root directory.It is recommended to set the ownership for the IBM Storage Scale Hadoop root directory as hdfs:supergroup with 755 (rwxr-xr-x) permissions. By default, it is set to root:root.
# /usr/bin/chown hdfs:supergroup <IBM Storage Scale mount directory>/< IBM Storage Scale Hadoop data directory>
For example:# /usr/bin/chown hdfs:supergroup /ibm/gpfs/datadir1
You can retrieve the IBM Storage Scale mount directory (gpfs.mnt.dir) and the IBM Storage Scale Hadoop data directory (gpfs.data.dir) using the following commands on a CES HDFS cluster node:
where, gpfs.data.dir=datadir1 and gpfs.mnt.dir=/ibm/gpfs./usr/lpp/mmfs/hadoop/sbin/mmhdfs config get gpfs-site.xml -k gpfs.mnt.dir -k gpfs.data.dir