Problem determination

  1. ERROR: Requested user hdfs is banned while running MapReduce jobs as user hdfs in native HDFS cluster.

    Solution:

    For solution, see https://my.cloudera.com/knowledge/LinuxTaskController-job-fails-with-error-Requested-user-hdfs?id=275909.

  2. IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] when running any hadoop fs command as a specified user.

    Solution:

    You must change to the appropriate principal and keytab for the specified user.
    kinit -k -t /usr/lpp/mmfs/hadoop/tc/hadoop/keytab/hdptestuser.headless.keytab hdp-user1@IBM.COM
  3. hive> CREATE database remote_db2 COMMENT 'Holds all the tables data in remote HDFS Transparency cluster' LOCATION hdfs://c16f1n13.gpfs.net:8020/user/hive/remote_db2;

    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException (message:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): Unauthorized connection for super-user: hive/c16f1n08.gpfs.net@IBM.COM from IP 192.0.2.1)

    Solution:

    Change the below custom core-site properties on all the nodes of the remote HDFS Transparency cluster:

    hadoop.proxyuser.hive.hosts=*

    hadoop.proxyuser.hive.groups=*

  4. Hive Import and Export cases are not supported in ViewFS schema. The following exception will be thrown:
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> EXPORT TABLE local_db_hdfs.passwd_int_part 
    TO 'viewfs://federationcluster/gpfs/hive/remote_db_gpfs/passwd_int_part_export';
    Error: Error while compiling statement: FAILED: SemanticException Invalid path only the 
    following file systems accepted for export/import : hdfs,pfile,file,s3,s3a,gs (state=42000,code=40000)
    

    Solution:

    Change the schema from ViewFS://xx to hdfs://xx.
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> EXPORT TABLE local_db_hdfs.passwd_int_part 
    TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export';
    INFO  : Compiling command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3): EXPORT 
    TABLE local_db_hdfs.passwd_int_part TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export'
    INFO  : Semantic Analysis Completed (retrial = false)
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3); 
    Time taken: 0.125 seconds
    INFO  : Executing command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3): EXPORT 
    TABLE local_db_hdfs.passwd_int_part TO 'hdfs://c16f1n10:8020/gpfs/hive/remote_db_gpfs/passwd_int_part_export'
    INFO  : Starting task [Stage-0:REPL_DUMP] in serial mode
    INFO  : Completed executing command(queryId=hive_20190110021038_7f5d37d6-f6e6-488a-b7ee-99261fc946e3); 
    Time taken: 0.19 seconds
    INFO  : OK
    No rows affected (0.343 seconds)
    
  5. Hive LOAD DATA INPATH cases failed in ViewFS schema with the following exception:
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> LOAD DATA INPATH '/tmp/2012.txt' 
    INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
    INFO  : Compiling command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd): 
    LOAD DATA INPATH '/tmp/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
    INFO  : Semantic Analysis Completed (retrial = false)
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd); 
    Time taken: 0.168 seconds
    INFO  : Executing command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd): 
    LOAD DATA INPATH '/tmp/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
    INFO  : Starting task [Stage-0:MOVE] in serial mode
    INFO  : Loading data to table db_bdpbase.employee partition (year=2012) from 
    viewfs://federationcluster/tmp/2012.txt
    ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. 
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
    viewfs://federationcluster/tmp/2012.txt to destination 
    viewfs://federationcluster/warehouse/tablespace/managed/hive/db_bdpbase.db/employee/year=2012/delta_0000011_0000011_0000
    INFO  : Completed executing command(queryId=hive_20190110024717_b6a0b5a0-d8a1-42e3-a6bb-40ca297d97dd); Time taken: 0.117 seconds
    Error: Error while processing statement: FAILED: Execution Error, 
    return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. org.apache.hadoop.hive.ql.metadata.HiveException: 
    Unable to move source viewfs://federationcluster/tmp/2012.txt to destination 
    viewfs://federationcluster/warehouse/tablespace/managed/hive/db_bdpbase.db/employee/year=2012/delta_0000011_0000011_0000 (state=08S01,code=1)
    

    Solution:

    Change the load path to use the same mount point directory. Here the table Employee is located at /warehouse/tablespace/managed/hive/db_bdpbase/Employee, so the data inpath is located under the ViewFS same mount point /warehouse.
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> LOAD DATA INPATH 
    '/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012);
    INFO  : Compiling command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca):
    LOAD DATA INPATH '/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
    INFO  : Semantic Analysis Completed (retrial = false)
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca); 
    Time taken: 0.118 seconds
    INFO  : Executing command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca): 
    LOAD DATA INPATH '/warehouse/2012.txt' INTO TABLE db_bdpbase.Employee PARTITION(year=2012)
    INFO  : Starting task [Stage-0:MOVE] in serial mode
    INFO  : Loading data to table db_bdpbase.employee partition (year=2012) from 
    viewfs://federationcluster/warehouse/2012.txt
    INFO  : Starting task [Stage-1:STATS] in serial mode
    INFO  : Completed executing command(queryId=hive_20190110024734_b5c59f01-9f08-4e47-8884-bb7dd8e131ca); 
    Time taken: 0.358 seconds
    INFO  : OK
    No rows affected (0.501 seconds)
    
  6. See Permission denied: Principal [name=hive, type=USER] does not have following privileges for operation DFS [ADMIN] in Hive Beeline console when run HiveQL.
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> load data local inpath 
    '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart;
    Error: Error while compiling statement: FAILED: HiveAccessControlException 
    Permission denied: Principal [name=hive, type=USER] does not have following 
    privileges for operation LOAD [ADMIN] (state=42000,code=40000)
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> dfs -ls /gpfs;
    Error: Error while processing statement: Permission denied: Principal [name=hive, type=USER] 
    does not have following privileges for operation DFS [ADMIN] (state=,code=1)
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n>
    

    Solution:

    Go to Ambari > Hive > CONFIGS > ADVANCED > Custom hive-site and add hive.users.in.admin.role to the list of comma-separated users who require admin role authorization (such as the user hive). Restart the Hive services for the changes to take effect.

    The permission denied error is fixed after adding hive.users.in.admin.role=hive.
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> dfs -ls /gpfs;
    +----------------------------------------------------+
    |                     DFS Output                     |
    +----------------------------------------------------+
    | drwxr-xr-x   - nobody root          0 2019-01-08 02:32 /gpfs/passwd_sparkshell |
    | -rw-r--r--   2 hdfs   root         52 2019-01-08 02:37 /gpfs/redhat-release |
    +----------------------------------------------------+
    25 rows selected (0.123 seconds)
    0: jdbc:hive2://c16f1n03.gpfs.net:2181,c16f1n> load data local inpath 
    '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart;
    INFO  : Compiling command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df): 
    load data local inpath '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart
    INFO  : Semantic Analysis Completed (retrial = false)
    INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
    INFO  : Completed compiling command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df); 
    Time taken: 0.285 seconds
    INFO  : Executing command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df): 
    load data local inpath '/tmp/hive/kv2.txt' into table local_db_hdfs.passwd_ext_nonpart
    INFO  : Starting task [Stage-0:MOVE] in serial mode
    INFO  : Loading data to table local_db_hdfs.passwd_ext_nonpart from file:/tmp/hive/kv2.txt
    INFO  : Starting task [Stage-1:STATS] in serial mode
    INFO  : Completed executing command(queryId=hive_20190111020239_bb71b8c0-1b00-4f96-bec2-e0e899de62df); 
    Time taken: 0.545 seconds
    INFO  : OK
    No rows affected (0.947 seconds)
    
  7. The OOZIE Service check failed with error: Error: E0904: Scheme [viewfs] not supported in uri [viewfs://hdpcluster/user/ambari-qa/examples/apps/no-op]

    Solution:

    Go to Ambari > Oozie > CONFIGS > ADVANCED > Custom oozie-site and add the following property:
    
    <property>
       <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
       <value>hdfs,viewfs</value>
    </property>