Troubleshooting InfoSphere Information Server on Hadoop

Use the information in this section to help you understand, isolate, and resolve issues with InfoSphere® Information Server on Hadoop.

This section provides descriptions of possible problems and the steps to correct them.

Table 1. Troubleshooting InfoSphere Information Server on Hadoop
Message or symptom	Action
You receive the following error message when you use a configuration file that statically defines nodes (fastname): `Host name verification failed for one or more nodes specified in your configuration file. Please check your config file to make sure it only has the fully qualified domain names for the hosts with unknown state. List of nodes that failed verification: Node: mynode.mycompany.com State: UNKNOWN` If a host is down and you are using a static configuration, you will see a similar message with State: `LOST`.	To avoid host name issues, use a dynamic configuration file. Using a configuration file with statically defined nodes can cause incorrect node names or node names that do not match the node name that is expected by Hadoop, which is usually the long host name.
You receive the following error message after you run an InfoSphere DataStage® job: `##F IIS-DSEE-TFIP-00079 20:01:0 2(007) <APT_TransformOperatorIm plJob_TransformerinTransformer, 1> Fatal Error: The interface does not have a field named "in put0Int8Comments_0"` The version of the shared libraries that is loaded during InfoSphere DataStage job run time is incorrect.	Network file system (NFS), Version 4 has a defect that loads the wrong shared libraries at run time. It affects any massively parallel processing (MPP) environment that uses NFS, Version 4 to share InfoSphere Information Server installation directories. To avoid this issue, use NFS Version 3.
The wrong host name is returned by operating system commands.	Configure both YARN and InfoSphere Information Server to have the hostname and hostname -f commands return the fully qualified host name of the system.
When you run large InfoSphere DataStage jobs, you receive the following error message: `java/langOutOfMemoryError`.	Increase the heap size of the Java™ Virtual Machine (JVM) that is used by HDFS by setting the LIBHDFS_OPTS parameter to `-Xmx1024m`. This setting can be used to change the heap size to be higher or lower.
When you run a parallel engine job from the InfoSphere DataStage client or when running $APT_ORCHHOME/etc/yarn_conf/start-pxyarn.sh from the command line, you receive the following error message: `/bin/yarn: No such file or directory` The error is logged in the following directory: /tmp/yarn_client.out.	This message is generated because the bin directory that contains the Hadoop YARN command is not in your PATH environment variable. Add this directory to your PATH environment variable or set the HADOOP_HOME or HADOOP_YARN_HOME environment variables so these two directories can be searched. Set these two environment variables at the job or project level if you run a parallel job from the InfoSphere DataStage client. Set the environment variables on the command prompt if you run the start-pxyarn.sh script from the command line.
You see the following message in the InfoSphere DataStage job log: `Could not get the port number of PX YARN Client: Please check the health of YARN/Hadoop. For further information, check /tmp/yarn_client.out and PXEngine/logs/yarn_logs/yarn_client.out files for more information."`	In the /tmp/yarn_client.out file, look for this error in /bin/yarn: `No such file or directory.` If you see this error, the bin directory that contains the Hadoop YARN command is not in your PATH environment. Add this directory to your PATH environment variable or set the HADOOP_HOME or HADOOP_YARN_HOME environment variables so these two directories can be searched. Set these two environment variables at the job or project level if you run a parallel job from the InfoSphere DataStage client. Set the environment variables on the command prompt if you run the start-pxyarn.sh script from the command line.
You receive the following message: `IPv6 is not currently supported by Hadop/Yarn and the required environment variable APT_USE_I PV4 is not set. It has been set to allow the job to continue.`	Set the APT_USE_IPV4 environment variable to `true`.
You receive the following message: `Big_Data_File_1,0: java.io.IOException: Failure to login using ticket cache file /home/hdfs/krb5cc_hdfs`	The InfoSphere DataStage administrator must be able to access the path that was provided for the APT_YARN_USER_CACHED_CRED_PATH environment variable. Verify that the InfoSphere DataStage administrator has access to the path.
You receive the following error message when you run InfoSphere DataStage jobs: `##F IIS-DSEE-TFPM-00493 11:32:39(015) <main_program> Fatal Error: Allocation of required 0 YARN containers has failed.` These jobs request zero containers from YARN. For example, you might be trying to import data.	Use the Peek operator in the InfoSphere DataStage job flow so that a container is requested from the Application Master. The issue occurs when zero containers are requested.
You receive the following error message when you open the InfoSphere DataStage Designer client or InfoSphere DataStage Director client: `Failed to connect to Information Server Engine: full-machine-name.domain.com, project project-name. This error can occur if you mapped Engine Credential User ID does not have sufficient access to the underline OS on the Engine system. (Internal Error (39202))`	This issue occurs when the LD_LIBRARY_PATH file path begins with /usr/lib64. Verify that the library paths that InfoSphere Information Server uses do not start with /usr/lib64. If one or more of the paths start with /usr/lib64, then move this part of the path name to the middle or end of the file path.
Your Hadoop first-in-first-out (FIFO) scheduling is not working.	Hadoop first-in-first-out (FIFO) scheduling is not supported by InfoSphere Information Server. Static configuration files, node pools, and data locality features in InfoSphere Information Server are incompatible with FIFO scheduling.
You receive the following error message in the yarnclient.out log when you run an InfoSphere DataStage job: `Error running YARN client daemon Permission denied: user= username, access=WRITE, inode="/user":group_name: group_name:drwxr-xr-x`	Add the user that does not have the appropriate permissions to write to the /user directory to the group that has permission to write to the directory. To add the user to the group, issue the following command: `usermod -a -G group_name username`
You receive the following error message when you run an InfoSphere DataStage parallel job: `2015-07-10 11:54:49 AM00050 WARNING ApplicationMaster$OSHRunnable: Error running parallel job with ID: 0271_0: Select timed out waiting for data.`	Increase the value that you set for the APT_YARN_MSG_TIMEOUT environment variable.
You receive the following error message in the InfoSphere DataStage job log: `WARNING shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.`	Add the file path to the Hadoop native library to the LD_LIBRARY_PATH environment variable. For example: `LD_LIBRARY_PATH=HadoopInstallPath /hadoop/lib/native:$LD_LIBRARY_PATH`
You receive the following error message when you run an InfoSphere QualityStage® job: `Unable to open file ./RT_QS3/V0S72/CASS_26361_0 for output, reason code 13 Permission denied`	Ensure that you correctly set permissions to access files.
You receive the following error message: `Fatal Error: Could not read the host:port details of the Application Master from PX YARN Client: Error getting the ApplicationMaster details. Look into YARN Client's logs for more information at /opt/IBM/InformationServer/Server /PXEngine/logs/yarn_logs /yarn_client.out*`	This happens because the YARN client is not able to receive communication from the Application Master that it tried to start. Check the yarn_client.out file to find out why is it not able to listen to the Application Master. It could be because the Application Master is not started because of any of the following reasons: You are trying to run more jobs concurrently, and it is taking more time to get the required number of Application Masters started. Try to pre-launch the required number of Application Masters by increasing the parameter APT_YARN_AM_POOL_SIZE to the number of concurrent jobs that you are trying to run. Unavailability of resources in the cluster, in which case you might need to increase the cluster capacity. It is also possible that or each of the containers is too high a portion of the cluster memory, unnecessarily wasting the resources. Check the value of yarn.scheduler.mininum-allocation-mb to see if you can reduce it. The maximum number of active applications is lower than the number of jobs that you want to run concurrently, so the required number of Application Masters for running concurrent jobs could not be started. Increase the value of the YARN configuration parameter, yarn.scheduler.capacity.maximum-am-resource-percent to increasing the number of possible active applications.
You receive the following error message when you run an InfoSphere DataStage job on SuSE Linux: `15/10/16 09:58:31 INFO client.RMProxy: Connecting to ResourceManager at ipsvm00443.swg.usma.ibm.com/ 9.70.141.242:8050 /usr/jdk64/java-1.8.0- openjdk-1.8.0/bin/java: relocation error: /opt/IBM/InformationServer/jdk/jre/lib /amd64/libnet.so: symbol JCL_Socketpair, version SUNWprivate_1.1 not defined in file libjava.so with link time reference`	This error indicates that the version of the Java Development Kit (JDK) that is installed with InfoSphere Information Server is different than the version that is installed with the YARN client that you are using with InfoSphere Information Server. To resolve this issue, update the value for the environment variable APT_YARN_JDK_LIBRARY_PATH to specify the Java Development Kit libraries that belong to the same Java Development Kit that is used by the YARN client. You can modify this setting in the dsenv file which is located in the following default directory: /opt/IBM/InformationServer/Sever/DSEngine/. For example, update the APT_YARN_JDK_LIBRARY_PATH environment variable to /usr/jdk64/java-1.8.0-openjdk-1.8.0/jre/lib/amd64.
You receive the following error message: `Error : Exception in thread "main" java.lang.UnsupportedClassVersionError: com/ibm/iis/ds/px/bigdata/yarn/PXYarnClient : Unsupported major.minor version 51.0 in yarnclient.out file.`	The error message indicates that your system JAVA is not version 1.7 or later. Take one of the following actions: Install Java 1.7 or later on the conductor node. If you start the YARN client through the job, add a job or project parameter JAVA_HOME that points to Java V. 1.7 or later. If you start the YARN client by using the command line, the export JAVA_HOME variable that points to Java V. 1.7. or later.
You are using NFS mounting and are experiencing performance problems.	NFS mounting is an option that works for smaller clusters and is supported by InfoSphere Information Server on Hadoop. However, when the clusters get bigger, NFS mounting can cause performance problems and cause a single point of failure for the cluster, so it is not typically used in production systems. Consider using another method for coping the binaries, such as HDFS. For additional information, see Copying binaries to Hadoop nodes.