User and group ids

Ensure that all user IDs and group IDs used in the cluster for running jobs, accessing the IBM Spectrum Scale file system or for the Hadoop services must be created and have the same values across all the IBM Spectrum Scale nodes. This is required for IBM Spectrum Scale.

If you are using LDAP, create the IDs and groups on the LDAP server and ensure that all nodes can authenticate the users.
If you are using local IDs, the IDs must be the same on all nodes with the same ID and group values across the nodes.
If you setup remote mount access for IBM Spectrum Scale, the owning cluster does not require to have the Hadoop uid and gid configured because there are no applications running on those nodes. However, if the owning cluster have other applications from non Hadoop clients, they need to ensure that the uid and gid used by the Hadoop cluster are not the same as the one used by the non Hadoop clients.
The anonymous user is not used by Hive if the hive.server2.authentication is configured as LDAP or Kerberos enabled. However, the default setting for hive.server2.authentication is set to NONE. Therefore, no authentication is done for Hive's requests to the Hiveserver2 (meta data). This means that all the requests are completed as anonymous user.

For example:

groupadd --gid 1000 hadoop
groupadd --gid 1016 rddcached #optionally align rddcached GID with UID
groupadd --gid 10013 anonymous # Use for Hive


useradd -g hadoop -u 1001 ams
useradd -g hadoop -u 1002 hive
useradd -g hadoop -u 1003 oozie
useradd -g hadoop -u 1004 ambari-qa
useradd -g hadoop -u 1005 flume
useradd -g hadoop -u 1006 hdfs
useradd -g hadoop -u 1007 solr
useradd -g hadoop -u 1008 knox
useradd -g hadoop -u 1009 spark
useradd -g hadoop -u 1010 mapred
useradd -g hadoop -u 1011 hbase
useradd -g hadoop -u 1012 zookeeper
useradd -g hadoop -u 1013 sqoop
useradd -g hadoop -u 1014 yarn
useradd -g hadoop -u 1015 hcat
useradd -g rddcached -u 1016 rddcached   #optionally align rddcached GID with UID
useradd -g hadoop -u 1017 kafka
useradd -g anonymous -u 10013 anonymous # Use for Hive

Note: UID or GID is the common way for a Linux® system to control access from users and groups. For example, if the user Yarn UID=100 on node1 generates data and the user Yarn UID=200 on node2 wants to read this data, the read operation fails because of permission issues.

Keeping a consistent UID and GID for all users on all nodes is important to avoid unexpected issues.

For the initial installation through Ambari, the UID or GID of users are consistent across all nodes. However, if you deploy the cluster for the second time, the UID or GID of these users might be inconsistent over all nodes (as per the AMBARI-10186 issue that was reported to the Ambari community).

After deployment, check whether the UID is consistent across all nodes. If it is not, you must fix it by running the following commands on each node, for each user or group that must be fixed:

##### Change UID of one account:

usermod -u <NEWUID><USER>

##### Change GID of one group:

groupmod -g <NEWGID><GROUP>

##### Update all files with old UID to new UID:

find / -user <OLDUID> -exec chown -h <NEWUID> {} \;

##### Update all files with old GID to new GID:

find / -group <OLDGID> -exec chgrp -h <NEWGID> {} \;

##### Update GID of one account:

usermod -g <NEWGID><USER>