Problem determination

ERROR: Requested user hdfs is banned while running MapReduce jobs as user hdfs in native HDFS cluster.
Solution:
For solution, see https://my.cloudera.com/knowledge/LinuxTaskController-job-fails-with-error-Requested-user-hdfs?id=275909.
IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] when running any hadoop fs command as a specified user.
Solution:
You must change to the appropriate principal and keytab for the specified user.
```
kinit -k -t /usr/lpp/mmfs/hadoop/tc/hadoop/keytab/hdptestuser.headless.keytab hdp-user1@IBM.COM
```
hive> CREATE database remote_db2 COMMENT 'Holds all the tables data in remote HDFS Transparency cluster' LOCATION hdfs://c16f1n13.gpfs.net:8020/user/hive/remote_db2;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException (message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connection for super-user: hive/c16f1n08.gpfs.net@IBM.COM from IP 192.0.2.1)
Solution:
Change the below custom core-site properties on all the nodes of the remote HDFS Transparency cluster:
hadoop.proxyuser.hive.hosts=*
hadoop.proxyuser.hive.groups=*

Hive Import and Export cases are not supported in ViewFS schema. The following exception will be thrown:

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> EXPORT TABLE local_db_hdfs.passwd_int_part 
TO 'viewfs://federationcluster/gpfs/hive/remote_db_gpfs/passwd_int_part_export';
Error: Error while compiling statement: FAILED: SemanticException Invalid path only the 
following file systems accepted for export/import : hdfs,pfile,file,s3,s3a,gs (state=42000,code=40000)

Solution:

Change the schema from ViewFS://xx to hdfs://xx.

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> EXPORT TABLE local_db_hdfs.passwd_int_part 
TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export';
INFO  : Compiling command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3): EXPORT 
TABLE local_db_hdfs.passwd_int_part TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export'
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3); 
Time taken: 0.125 seconds
INFO  : Executing command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3): EXPORT 
TABLE local_db_hdfs.passwd_int_part TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export'
INFO  : Starting task [Stage-0:REPL_DUMP] in serial mode
INFO  : Completed executing command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3); 
Time taken: 0.19 seconds
INFO  : OK
No rows affected (0.343 seconds)

Hive LOAD DATA INPATH cases failed in ViewFS schema with the following exception:

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> LOAD DATA INPATH '/tmp/2012.txt' 
INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
INFO  : Compiling command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd): 
LOAD DATA INPATH '/tmp/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd); 
Time taken: 0.168 seconds
INFO  : Executing command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd): 
LOAD DATA INPATH '/tmp/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table db_bdpbase.employee partition (year=2012) from 
viewfs://federationcluster/tmp/2012.txt
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
viewfs://federationcluster/tmp/2012.txt to destination 
viewfs://federationcluster/warehouse/tablespace/managed/hive/db_bdpbase.db/employee/year=2012/delta_0000011_0000011_0000
INFO  : Completed executing command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd); Time taken: 0.117 seconds
Error: Error while processing statement: FAILED: Execution Error, 
return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to move source viewfs://federationcluster/tmp/2012.txt to destination 
viewfs://federationcluster/warehouse/tablespace/managed/hive/db_bdpbase.db/employee/year=2012/delta_0000011_0000011_0000 (state=08S01,code=1)

Solution:

Change the load path to use the same mount point directory. Here the table Employee is located at /warehouse/tablespace/managed/hive/db_bdpbase/Employee, so the data inpath is located under the ViewFS same mount point /warehouse.

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> LOAD DATA INPATH 
'/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
INFO  : Compiling command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca):
LOAD DATA INPATH '/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca); 
Time taken: 0.118 seconds
INFO  : Executing command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca): 
LOAD DATA INPATH '/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table db_bdpbase.employee partition (year=2012) from 
viewfs://federationcluster/warehouse/2012.txt
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca); 
Time taken: 0.358 seconds
INFO  : OK
No rows affected (0.501 seconds)

See Permission denied: Principal [name=hive, type=USER] does not have following privileges for operation DFS [ADMIN] in Hive Beeline console when run HiveQL.

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> load data local inpath 
'/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart;
Error: Error while compiling statement: FAILED: HiveAccessControlException 
Permission denied: Principal [name=hive, type=USER] does not have following 
privileges for operation LOAD [ADMIN] (state=42000,code=40000)
0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> dfs -ls /gpfs;
Error: Error while processing statement: Permission denied: Principal [name=hive, type=USER] 
does not have following privileges for operation DFS [ADMIN] (state=,code=1)
0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n>

Solution:

Go to Ambari > Hive > CONFIGS > ADVANCED > Custom hive-site and add hive.users.in.admin.role to the list of comma-separated users who require admin role authorization (such as the user hive). Restart the Hive services for the changes to take effect.

The permission denied error is fixed after adding hive.users.in.admin.role=hive.

0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> dfs -ls /gpfs;
+----------------------------------------------------+
|                     DFS Output                     |
+----------------------------------------------------+
| drwxr-xr-x   - nobody root          0 2019-01-08 02:32 /gpfs/passwd_sparkshell |
| -rw-r--r--   2 hdfs   root         52 2019-01-08 02:37 /gpfs/redhat-release |
+----------------------------------------------------+
25 rows selected (0.123 seconds)
0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> load data local inpath 
'/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart;
INFO  : Compiling command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df): 
load data local inpath '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df); 
Time taken: 0.285 seconds
INFO  : Executing command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df): 
load data local inpath '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart
INFO  : Starting task [Stage-0:MOVE] in serial mode
INFO  : Loading data to table local_db_hdfs.passwd_ext_nonpart from file:/tmp/hive/kv2.txt
INFO  : Starting task [Stage-1:STATS] in serial mode
INFO  : Completed executing command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df); 
Time taken: 0.545 seconds
INFO  : OK
No rows affected (0.947 seconds)

The OOZIE Service check failed with error: Error: E0904: Scheme [viewfs] not supported in uri [viewfs://hdpcluster/user/ambari-qa/examples/apps/no-op]
Solution:
Go to Ambari > Oozie > CONFIGS > ADVANCED > Custom oozie-site and add the following property:
```
<property>
   <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
   <value>hdfs,viewfs</value>
</property>
```