Permissions for running job processes on Hadoop

The user who runs job process might be either the DataStage and QualityStage Administrator (dsadm) or the YARN administrative user (yarn), depending on your configuration.

The YARN administrative user runs job processes on Hadoop except in the following cases, when the DataStage and QualityStage Administrator runs job processes:
  • When Kerberos is enabled on the cluster.
  • When Kerberos is not enabled and both of the following are true:
    • The property yarn.nodemanager.container-executor.class is set to the value org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.
    • The property yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users is set to false.
    Both properties are set in the yarn-site.xml file.

Additional permissions required

When the yarn user runs job processes on Hadoop, that user must be granted additional permissions to access directories and files. For example, job processes try to access the scratch disk that is mentioned in the PX configuration files, which might have been created and owned by the dsadm user. If the yarn user doesn’t have the required permissions, the access attempt fails and the job fails. To avoid job failures, you must add the yarn user to the group that includes dsadm, which is usually named dstage, and grant the required permissions to that group.

Additional permissions might also be required when you copy the binaries, depending on the installation topology and the type of binary distribution. These permissions are also granted by adding users into groups. The following table shows permissions that must be added.

Type of installation HDFS distribution SSH distribution
Edge node No additional permissions are required. Add yarn user to the primary group of dsadm, which is typically dstage, on every node except the edge node.
Non-edge-node Add yarn user to the primary group of dsadm, which is typically dstage, on the Engine tier node only. Add yarn user to the primary group of dsadm, which is typically dstage, on every node in the cluster, including the Engine tier node.

See Copying binaries for more information on the permissions required.