Question & Answer
Question
After running InfoSphere DataStage jobs for a while with Operations Console and Workload Management Server (WLM) enabled, the Operations Console processes stop one after another with error "java.lang.OutOfMemoryError: Failed to create a thread" in odbqapp logs. For example, the error below in /opt/IBM/InformationServer/Server/DSODB/logs/odbqapp-20141124144639888.log 2014-11-20 08:54:05,817 ERROR com.ibm.datastage.runtime.odbqapp.server.ODBQueryAppSocketServer.main (ODBQueryAppSocketServer.java:393) - The exception occurred while creating a server socket. The exception message is:Failed to create a thread: retVal -1073741830, errno 11 java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11
How can I solve the issue?
Cause
The native out-of-memory error "Failed to create a thread" indicates the JVM is unable to create a new Java thread. The error can be caused by reasons such as inadequate user limits, too many threads running on the system, and operating system resource problem, etc.
Answer
Please follow these steps to resolve:
1. Ensure the user limits (ulimit -u and ulimit -n) of root, dsadm and other DataStage users are set to the required values for IBM InfoSphere Information Server as documented here.
Let's look at the user limits in a javacore file generated on a Linux platform below:
1CIUSERLIMITS User Limits (in bytes except for NOFILE and NPROC)
NULL ------------------------------------------------------------------------
NULL type soft limit hard limit
2CIUSERLIMIT RLIMIT_AS unlimited unlimited
2CIUSERLIMIT RLIMIT_CORE 0 unlimited
2CIUSERLIMIT RLIMIT_CPU unlimited unlimited
2CIUSERLIMIT RLIMIT_DATA unlimited unlimited
2CIUSERLIMIT RLIMIT_FSIZE unlimited unlimited
2CIUSERLIMIT RLIMIT_LOCKS unlimited unlimited
2CIUSERLIMIT RLIMIT_MEMLOCK 65536 65536
2CIUSERLIMIT RLIMIT_NOFILE 1024 65536
2CIUSERLIMIT RLIMIT_NPROC 1024 20480
2CIUSERLIMIT RLIMIT_RSS unlimited unlimited
2CIUSERLIMIT RLIMIT_STACK 10485760 unlimited
2CIUSERLIMIT RLIMIT_MSGQUEUE 819200 819200
2CIUSERLIMIT RLIMIT_NICE 0 0
2CIUSERLIMIT RLIMIT_RTPRIO 0 0
2CIUSERLIMIT RLIMIT_SIGPENDING 515207 515207
The soft limit of both nproc and nofile highlighted above are too low. They need to be increased to the values suitable for the expected work load on the system.
The example below shows one way to change the ulimit settings of nproc and nofile on Linux when all DataStage users including dsadm belong to the same 'dstage' group:
- Edit the /etc/security/limits.d/90-nproc.conf file to increase the soft limit of nproc as shown below
* soft nproc 10240
root soft nproc unlimited
- Edit the /etc/security/limits.conf file to increase the soft limit of nproc and nofile as shown below
@dstage soft nofile 10240
@dstage hard nofile 65536
@dstage soft nproc 10240
@dstage hard nproc 20480
- Reboot the system after the user limit changes
2. Restore the default maximum heap size (-Xmx) for the java process(es) you may have manually increased in order to resolve the native out-of-memory issue in previous attempts. The native heap is different from the java heap. Increase the java heap size can have adverse effect on the issue here as it reduces the native memory available. The java heap should only be increased if the default maximum heap size is no longer sufficient to run the process. For example, the EngMonApp problem described in technote 1651827.
Heap and native javacore files look different. To find out if you also have the heap issue, you can check the free and allocated heap spaces. For example,
The TITLE and MEMINFO sections below indicate a heap issue:
0SECTION TITLE subcomponent dump routine
NULL ===============================
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" received
0SECTION MEMINFO subcomponent dump routine
NULL =================================
1STHEAPFREE Bytes of Heap Space Free: 0
1STHEAPALLOC Bytes of Heap Space Allocated: 18000000
The TITLE section below indicates a native issue:
0SECTION TITLE subcomponent dump routine
NULL ===============================
1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received
3. Install the required WLM patches JR45668, JR47735 and JR46958.
4. Reduce the work load to avoid overstressing the system because when it runs at near capacity things start to fail.
For example,
- a) Reduce the number of jobs running on the system.
b) Reduce the number of queries used by the Operations Console internally:- - Increase the refresh intervals on the Operations Console UI
- Reduce the number of active sessions by closing or logging out the Operations Console UI.
- - Increase the refresh intervals on the Operations Console UI
5. On Linux platform, turn off Huge Pages to reduce the increased memory usage by the Linux kernel ‘Huge Page’ feature as described in technote 1664196.
6. Disable WLM or both Operations Console and WLM if you have no plan to use them.
For example,
- a). To disable WLM:
- - Set WLMON to 0 in the /opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg file:
WLMON=0
- Restart the DSEngine as dsadm
cd `cat /.dshome`
. ./dsenv
bin/uv -admin -stop
bin/uv -admin -start
b). To disable both Operations Console and WLM:- - Set both DSODBON and WLMON to 0 in the DSODBConfig.cfg file
DSODBON=0
WLMON=0
- Stop the AppWatcher service as dsadm
/opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -stop
- Restart the DSEngine as shown in the step above.
- - Set WLMON to 0 in the /opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg file:
Was this topic helpful?
Document Information
Modified date:
23 July 2019
UID
swg21701446