IBM Support

How to solve java.lang.OutOfMemoryError, failed to create a thread, when IBM InfoSphere DataStage and QualityStage Operations Console and Workload Management Server are enabled

Question & Answer


Question

After running InfoSphere DataStage jobs for a while with Operations Console and Workload Management Server (WLM) enabled, the Operations Console processes stop one after another with error "java.lang.OutOfMemoryError: Failed to create a thread" in odbqapp logs. For example, the error below in /opt/IBM/InformationServer/Server/DSODB/logs/odbqapp-20141124144639888.log 2014-11-20 08:54:05,817 ERROR com.ibm.datastage.runtime.odbqapp.server.ODBQueryAppSocketServer.main (ODBQueryAppSocketServer.java:393) - The exception occurred while creating a server socket. The exception message is:Failed to create a thread: retVal -1073741830, errno 11 java.lang.OutOfMemoryError: Failed to create a thread: retVal -1073741830, errno 11

How can I solve the issue?

Cause

The native out-of-memory error "Failed to create a thread" indicates the JVM is unable to create a new Java thread. The error can be caused by reasons such as inadequate user limits, too many threads running on the system, and operating system resource problem, etc.

Answer

Please follow these steps to resolve:

1. Ensure the user limits (ulimit -u and ulimit -n) of root, dsadm and other DataStage users are set to the required values for IBM InfoSphere Information Server as documented here.

Let's look at the user limits in a javacore file generated on a Linux platform below:

1CIUSERLIMITS  User Limits (in bytes except for NOFILE and NPROC)
NULL           ------------------------------------------------------------------------
NULL           type                            soft limit           hard limit
2CIUSERLIMIT   RLIMIT_AS                        unlimited            unlimited
2CIUSERLIMIT   RLIMIT_CORE                              0            unlimited
2CIUSERLIMIT   RLIMIT_CPU                       unlimited            unlimited
2CIUSERLIMIT   RLIMIT_DATA                      unlimited            unlimited
2CIUSERLIMIT   RLIMIT_FSIZE                     unlimited            unlimited
2CIUSERLIMIT   RLIMIT_LOCKS                     unlimited            unlimited
2CIUSERLIMIT   RLIMIT_MEMLOCK                       65536                65536
2CIUSERLIMIT   RLIMIT_NOFILE                         1024                65536
2CIUSERLIMIT   RLIMIT_NPROC                          1024                20480
2CIUSERLIMIT   RLIMIT_RSS                       unlimited            unlimited
2CIUSERLIMIT   RLIMIT_STACK                      10485760            unlimited
2CIUSERLIMIT   RLIMIT_MSGQUEUE                     819200               819200
2CIUSERLIMIT   RLIMIT_NICE                              0                    0
2CIUSERLIMIT   RLIMIT_RTPRIO                            0                    0
2CIUSERLIMIT   RLIMIT_SIGPENDING                   515207               515207

The soft limit of both nproc and nofile highlighted above are too low. They need to be increased to the values suitable for the expected work load on the system.

The example below shows one way to change the ulimit settings of nproc and nofile on Linux when all DataStage users including dsadm belong to the same 'dstage' group:

- Edit the /etc/security/limits.d/90-nproc.conf file to increase the soft limit of nproc as shown below
*        soft    nproc     10240
root     soft    nproc     unlimited

- Edit the /etc/security/limits.conf file to increase the soft limit of nproc and nofile as shown below
@dstage     soft    nofile     10240
@
dstage     hard    nofile     65536
@dstage     soft    nproc      10240
@dstage     hard    nproc      20480

- Reboot the system after the user limit changes

2. Restore the default maximum heap size (-Xmx) for the java process(es) you may have manually increased in order to resolve the native out-of-memory issue in previous attempts. The native heap is different from the java heap. Increase the java heap size can have adverse effect on the issue here as it reduces the native memory available. The java heap should only be increased if the default maximum heap size is no longer sufficient to run the process. For example, the EngMonApp problem described in technote 1651827.

Heap and native javacore files look different. To find out if you also have the heap issue, you can check the free and allocated heap spaces. For example,

The TITLE and MEMINFO sections below indicate a heap issue:
0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" received

0SECTION       MEMINFO subcomponent dump routine
NULL           =================================
1STHEAPFREE    Bytes of Heap Space Free: 0 
1STHEAPALLOC   Bytes of Heap Space Allocated: 18000000

The TITLE section below indicates a native issue:
0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 11" received 

3. Install the required WLM patches JR45668, JR47735 and JR46958.

4. Reduce the work load to avoid overstressing the system because when it runs at near capacity things start to fail.
For example,

  • a) Reduce the number of jobs running on the system.

    b) Reduce the number of queries used by the Operations Console internally:
    • - Increase the refresh intervals on the Operations Console UI
      - Reduce the number of active sessions by closing or logging out the Operations Console UI.


5. On Linux platform, turn off Huge Pages to reduce the increased memory usage by the Linux kernel ‘Huge Page’ feature as described in technote 1664196.

6. Disable WLM or both Operations Console and WLM if you have no plan to use them.
For example,

  • a). To disable WLM:
    • - Set WLMON to 0 in the /opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg file:
      WLMON=0

      - Restart the DSEngine as dsadm
      cd `cat /.dshome`
      . ./dsenv
      bin/uv -admin -stop
      bin/uv -admin -start

    b). To disable both Operations Console and WLM:
    • - Set both DSODBON and WLMON to 0 in the DSODBConfig.cfg file
      DSODBON=0
      WLMON=0

      - Stop the AppWatcher service as dsadm
      /opt/IBM/InformationServer/Server/DSODB/bin/DSAppWatcher.sh -stop

      - Restart the DSEngine as shown in the step above.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"8.7;8.7.0.1;8.7.0.2;9.1;9.1.0.1;9.1.2.0;11.3;11.3.1.0;11.3.1.1;11.3.1.2;11.5;11.5.0.1;11.5.0.2;11.7;11.7.0.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
23 July 2019

UID

swg21701446