Problem determination
This topic contains information on troubleshooting the CES HDFS and CDP Private Cloud Base issues.
For CES HDFS known problem determination, see Troubleshooting.
- If Kerberos is enabled, sometimes Zookeeper or Yarn might not start successfully due to issues
with the keytab generation.
Solution:
Zookeeper- In the Cloudera Manager, click and search for kerberos.
- Enable the check boxes for Enable Kerberos Authentication and Enable Server to Server SASL Authentication.
Yarn- In the Cloudera Manager, click and search for kerberos.
- Enable the check boxes for Enable Kerberos Authentication for HTTP Web-Consoles.
- Impersonation error with Hive/Oozie/Livy even when the proxyuser settings are set already within
the IBM Storage Scale service in the Cloudera
Manager.For example, the following error is seen:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: <component>/<host>@<realm> is not allowed to impersonate <user>
Solution:
- Stop the IBM Storage Scale service in Cloudera Manager.
- Enable proxyuser settings for HDFS Transparency as mentioned in Enable and configure CES HDFS.
- Restart the IBM Storage Scale service from Cloudera Manager.
- In the Cloudera Manager Hive service, for Hiveserver2 to start and function normally, set hive.metastore.event.db.notification.api.auth to false on hive-site.xml.
- Solr does not start after adding Ranger.
Solution:
It is recommended to add Solr and Ranger services together with the IBM Storage Scale service at the time of initial CDP Private Cloud Base cluster creation. However, if Ranger and Solr were added later, following workaround is needed for Solr to start properly.- Log in to the Cloudera Manager console.
- While adding the Solr service, set the ZooKeeper ZNode parameter to solr-infra in the configuration wizard or if you had already added Solr but Solr does not come up properly, click and search for ZNode and set the value of the Solr configuration ZooKeeper ZNode parameter to solr-infra.
- Ensure that Kerberos checkbox is enabled in the Solr configuration.
- Continue to add the Solr and Ranger services. Skip if you have already added the services.
- After adding Ranger, the Solr service changes its name to CDP-INFRA-SOLR.
- Restart Solr and Ranger.
- Ensure that Solr and Ranger have started successfully. Do not proceed unless Solr and Ranger appear healthy.
- Ranger
issues with TLS enabled.You may encounter one of the following issues if Ranger is enabled together with TLS security:
- NameNodes do not start after updating the configurations and throw the following error:
org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. java.lang.IllegalArgumentException: bound must be positive
- NameNodes can start but Ranger policies do not work. NameNode log shows the following error
message:
org.apache.ranger.admin.client.RangerAdminRESTClient: Failed to get response, Error is : TrustManager is not specified2022-08-23 13:16:37,809 ERROR org.apache.ranger.admin.client.RangerAdminRESTClient: Error getting Roles; Received NULL …. Error getting policies; Received NULL response!!. secureMode=true, ....
Root cause of the issue:
At the time of starting the IBM Storage Scale service from Cloudera Manager, there are three additional configuration files (ranger-hdfs-security.xml, ranger-hdfs-policymgr-ssl.xml and ranger-hdfs-audit.xml) generated for HDFS Transparency. In certain scenarios, these three files do not get propagated to IBM Storage Scale CCR, causing the above problems.
Solution:
Start the NameNodes as following:- To obtain the latest spectrumscale-TRANSPARENCY_NAMENODE directory under /run/cloudera-scm-agent/process/ on the NameNode, start the IBM Storage Scale service from Cloudera Manager.
- Stop the IBM Storage Scale service from Cloudera Manager.
- Log into any one of the HDFS Transparency NameNode hosts.
- Run the following commands to find the current <NameNode configuration
directory>:
# cd /run/cloudera-scm-agent/process/ # ls -lrt| grep spectrumscale-TRANSPARENCY_NAMENODE | tail -n 1
- Run the following commands to upload the Ranger configuration files from the
<NameNode configuration directory>, as found in the above step, to IBM Storage Scale
CCR:
# cd <NameNode configuration directory> # mmhdfs config import --nocheck . ranger-hdfs-security.xml,ranger-hdfs-policymgr-ssl.xml,ranger-hdfs-audit.xml # mmhdfs config upload
- Start the IBM Storage Scale service from Cloudera Manager.
- NameNodes do not start after updating the configurations and throw the following error:
- In the IBM Storage Scale service, metrics show NO
DATA after enabling Kerberos.
After enabling Kerberos, the HTTP port value for DataNode changes to less than 1024. Therefore, metrics starts showing NO DATA.
Solution:- Go to Cloudera Manager GUI, click and type HTTP Port in filter.
- Set transparency.datanode.http.port to 1006.
- If solr is pre-installed, solr znode changes to /solr-infra when you add Ranger or Atlas on a
cluster with Ranger.Solution:
- Rename znode back to /solr.
- Renaming the znode causes an Atlas initialization issue. To address this issue, restart Atlas on
correct znode as follows:
- Stop Atlas.
- Go to Atlas Service Actions, click .
- Installing Ranger service may fail with the following SQL error from
MySQL/MariaDB:
SQLException : SQL state: HY000 java.sql.SQLException: This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable) ErrorCode: 1418
Solution:
- Before creating the database for Ranger, run the following command on the SQL prompt. The user
account running the command must have MySQL or MariaDB administrator
privilege:
SET GLOBAL log_bin_trust_function_creators = 1;
- After the Ranger installation completes, run the following command to roll back the above value
to its default setting of
0:
SET GLOBAL log_bin_trust_function_creators = 0;
Note: The above operations might make the MySQL or MariaDB database less secure and less robust during this duration. Other options that you can try are as follows:- Use a commercial database for Ranger backend storage rather than an open source option such as MySQL or MariaDB.
- Use a separate MySQL or MariaDB instance exclusively for Ranger.
- Before creating the database for Ranger, run the following command on the SQL prompt. The user
account running the command must have MySQL or MariaDB administrator
privilege:
- Hive service shows a health check issue in Hive Metastore Canary.Hive metastore canary fails with following error:
2020-11-02 17:44:53,054 WARN com.cloudera.cmon.firehose.polling.CreateDirectoryTask: Exception while creating directory '/user/hue', for 'hue:hue', with permission: 775 org.apache.hadoop.security.AccessControlException: Permission denied: user=hue, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
Solution:
- Add hue to the Hadoop supergroup.
- On all the HDFS Transparency nodes, run the following
command:
usermod -G supergroup hue
Setting hue with the supergroup permission is now able to create the /user/hue directory which had failed during the Hive service Canary health check in the Hive Metastore.
- After the health check is resolved, remove the hue user from the Hadoop supergroup.
- Webhdfs does not work with the CES HDFS IP/hostname. For example, the hdfs dfs -ls
webhdfs://<hdfs namespace>/ command throws an authentication
error.
Solution:
- Stop the IBM Storage Scale service using Cloudera Manager.
- Create an additional NameNode HTTP principal with the CES HDFS IP/hostname. For example:
kadmin.local -q "addprinc -randkey HTTP/<myceshost>@IBM.COM
- To update the spnego.service.keytab keytab files, see Setting up Kerberos for HDFS Transparency nodes. Update the spnego.service.keytab
files for each NameNode.
- Backup /etc/security/keytabs/spnego.service.keytab.
- Update the spnego.service.keytab keytab file by importing the above HTTP
principal.
For example:
kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab HTTP/myceshdfs.gpfs.net@IBM.COM
- If you have used the supplied Kerberos script with HDFS Transparency v3.1.1-3, the NameNode host
principal might be missing in the spnego.service.keytab file. Therefore, import
the spnego.service.keytab file.
For example:
kadmin.local ktadd -k /etc/security/keytabs/spnego.service.keytab host/nn01.gpfs.net@IBM.COM
- Move the spnego.service.keytab file to the corresponding NameNode host.
- Set the dfs.web.authentication.kerberos.principal parameter to
*:
<property> <name>dfs.web.authentication.kerberos.principal</name> <value>*</value> </property>
- Ensure that the CES HDFS hostname resolves from DNS and not just from an entry in the /etc/hosts file.
- Start the IBM Storage Scale service using Cloudera Manager
- Creating a Livy interactive session using REST API as follows might fail or hang:
curl -u : --negotiate -X POST --data '{"kind" : "spark"}' -H "Content-Type: application/json" <livy host>:8998/sessions
This is particularly observed in the IBM® Power® platform.
Solution:- Disable the Livy recovery mode within Livy service by setting livy.server.recovery.mode to off.
- Recreate the session.
- DataNode colocation
The Zeppelin service fails to start when the Zeppelin server is colocated with the HDFS Transparency DataNode. The CM agent creates the /var/lib/zeppelin directory with root:root permissions.
Solution:- Change the permission of the directory to zeppelin:zeppelin.
- Restart Zeppelin.
- An error is seen when you try to create a Livy interactive session using REST API.When you are trying to execute Livy REST API (for example:
curl -u : --negotiate -X POST --data '{"kind" : "spark"}' -H "Content-Type: application/json" <livy host>:8998/sessions
), the following error occurs:org.apache.hadoop.ipc.RemoteException(java.lang.ArithmeticException): / by zero org.apache.hadoop.hdfs.server.namenode.GPFSNamesystemV0.getAdditionalBlock(GPFSNamesystemV0.java:711) org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:864) org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:549) org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872) org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:422) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
Solution:- Ensure that the dfs.blocksize parameter in the Cloudera Manager GUI in the IBM Storage Scale service matches the dfs.blocksize parameter in the CES HDFS configuration in /var/mmfs/hadoop/etc/hadoop.
- Restart the Livy service after the dfs.blocksize parameter in the Cloudera Manager GUI matches the dfs.blocksize parameter in the CES HDFS configuration.
- Unable to create the managed Hive tables.
All the tables that are created are external tables even when you explicitly requested to create managed tables.
Solution:
For creating the managed Hive tables, install the Hive on Tez service in CDP Private Cloud Base. For information on adding the Hive on Tez service, see Installing Hive on Tez and adding a HiveServer role.
- While creating the encryption key you see an authorization exception in the Ranger KMS
GUI.
Solution:
Add the following parameters in kms-site.xml from Cloudera Manager GUI and retry:- hadoop.kms.proxyuser.rangeradmin.hosts=*
- hadoop.kms.proxyuser.rangeradmin.groups=*
- hadoop.kms.proxyuser.rangeradmin.users=*
- When uploading Oozie shareLib in IBM
Storage Scale, you face the error blocksize(xxxx) should be an integral mutiple
of dataBlockSize(yyyy). This error occurs because Oozie always tries different
blocksize values when uploading shareLib.
Solution:
- Go to .
- Search the HDFS client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml and add or update dfs.namenode.fs-limits.min-block-size = <dfs.blocksize>
- Save and deploy the client configuration.
- Restart IBM Storage Scale and Oozie services.
-
The hadoop_secure_web_ui configuration parameter is not effective for the Yarn service when IBM Storage Scale is integrated.
In the Yarn service, even if the hadoop_secure_web_ui configuration parameter is set to Enabled, the Resource Manager and the History Server web user interfaces still use simple authentication.
Solution:
This issue is fixed in Cloudera Manager 7.6.1+ for IBM Storage Scale as HDFS provider. Upgrade Cloudera Manager to 7.6.1.