Troubleshooting
Problem
You are able to see any hadoop file (understand this could be due to 755 or higher permission) although you should not be able to. No matter which user is logged into WSL, anyone seems to be using the service account dsxhi to access the data lake.
Resolving The Problem
"Impersonation & Security" is not quite possible in Hadoop without Kerberos.
There are 2 main reasons why:
A) Any user can easily get escalated super user privileges, such as by a "export HADOOP_USER_NAME".
[user1@dicker1 ~]$ whoami
user1
[user1@dicker1 ~]$ hadoop fs -mkdir /user/secured
mkdir: Permission denied: user=user1, access=WRITE, inode="/user/secured":hdfs:hdfs:drwxr-xr-x
[user1@dicker1 ~]$ export HADOOP_USER_NAME=hdfs
[user1@dicker1 ~]$ hadoop fs -mkdir /user/secured
[user1@dicker1 ~]$ hadoop fs -ls /user/
Found 11 items
drwxrwx--- - ambari-qa hdfs 0 2018-04-16 07:44 /user/ambari-qa
drwxr-xr-x - bigsql hdfs 0 2019-01-24 10:41 /user/bigsql
drwxr-xr-x - dsxhi hdfs 0 2018-12-17 20:51 /user/dsxhi
drwxr-xr-x - hbase hdfs 0 2018-04-16 07:41 /user/hbase
drwxr-xr-x - hcat hdfs 0 2018-04-16 07:42 /user/hcat
drwx------ - hdfs hdfs 0 2018-12-14 14:13 /user/hdfs
drwxr-xr-x - hive hdfs 0 2018-12-14 13:01 /user/hive
drwxrwxr-x - livy hdfs 0 2019-01-25 11:43 /user/livy
drwxr-xr-x - hdfs hdfs 0 2019-02-12 18:35 /user/secured
drwxrwxr-x - spark hdfs 0 2018-12-14 12:48 /user/spark
drwxr-xr-x - user1 hadoop 0 2019-01-09 09:44 /user/user1
[user1@dicker1 ~]$ hadoop fs -chown user1 /user/secured
user1
[user1@dicker1 ~]$ hadoop fs -mkdir /user/secured
mkdir: Permission denied: user=user1, access=WRITE, inode="/user/secured":hdfs:hdfs:drwxr-xr-x
[user1@dicker1 ~]$ export HADOOP_USER_NAME=hdfs
[user1@dicker1 ~]$ hadoop fs -mkdir /user/secured
[user1@dicker1 ~]$ hadoop fs -ls /user/
Found 11 items
drwxrwx--- - ambari-qa hdfs 0 2018-04-16 07:44 /user/ambari-qa
drwxr-xr-x - bigsql hdfs 0 2019-01-24 10:41 /user/bigsql
drwxr-xr-x - dsxhi hdfs 0 2018-12-17 20:51 /user/dsxhi
drwxr-xr-x - hbase hdfs 0 2018-04-16 07:41 /user/hbase
drwxr-xr-x - hcat hdfs 0 2018-04-16 07:42 /user/hcat
drwx------ - hdfs hdfs 0 2018-12-14 14:13 /user/hdfs
drwxr-xr-x - hive hdfs 0 2018-12-14 13:01 /user/hive
drwxrwxr-x - livy hdfs 0 2019-01-25 11:43 /user/livy
drwxr-xr-x - hdfs hdfs 0 2019-02-12 18:35 /user/secured
drwxrwxr-x - spark hdfs 0 2018-12-14 12:48 /user/spark
drwxr-xr-x - user1 hadoop 0 2019-01-09 09:44 /user/user1
[user1@dicker1 ~]$ hadoop fs -chown user1 /user/secured
B) The jobs in YARN run as the service user for YARN (yarn user).
This means that the YARN user would need access to any directories it tries to write to.
This means that the YARN user would need access to any directories it tries to write to.
So, DSXHI/Hadoop should not be used in production without Kerberos, as it will be inherently insecure for HDFS and for YARN.
Document Location
Worldwide
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGWL","label":"IBM Watson Studio Local"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"1.2.3","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]
Was this topic helpful?
Document Information
More support for:
IBM Watson Studio Local
Software version:
1.2.3
Document number:
874380
Modified date:
30 December 2019
UID
ibm10874380