IBM Support

Unable to get Java stacks of threads from Javacores especially during cases of hangs and deadlocks on IA 32

Troubleshooting


Problem

The Java stacks of the threads involved in a deadlock or a hang are sometimes found to be missing in 1.4.2 Javacores on IA 32. This makes debugging and diagnostics of hang and exception symptoms difficult as we need to determine what is causing the hang or the deadlock for which the Java stacks of the involved threads are required.

Symptom

This scenario can be seen from the 1.4.2 Javacores.

An example of a deadlock in the JRE classes is shown below (similar scenarios can apply to non-JRE deadlocks/hangs as well):

1LKDEADLOCK Deadlock detected !!!

2LKDEADLOCKTHR Thread "Thread-0" (8320600)
3LKDEADLOCKWTR is waiting for:
4LKDEADLOCKMON sys_mon_t:0x08079628 infl_mon_t: 0x00000000:
4LKDEADLOCKOBJ main.childcl@1006DD50/1006DD58:
3LKDEADLOCKOWN which is owned by:
2LKDEADLOCKTHR Thread "main" (8078370)
3LKDEADLOCKWTR which is waiting for:
4LKDEADLOCKMON sys_mon_t:0x08079688 infl_mon_t: 0x00000000:
4LKDEADLOCKOBJ main.parentcl@1006DDD0/1006DDD8:
3LKDEADLOCKOWN which is owned by:
2LKDEADLOCKTHR Thread "Thread-0" (8320600)


3XMTHREADINFO "Thread-0" (TID:1006DCF8, sys_thread_t:8320600, state:CW, native ID:1600B) prio=5

3HPREGISTERS Register Values

3HPREGVALUES EAX : 00000001, EBX : 083208F8, ECX : 00000000
3HPREGVALUES EDX : 08320600, ESI : 083209C0, EDI : BE5FEE90
3HPREGVALUES EBP : BE5FEEA0, ESP : BE5FED88, EIP : 4041FEB4
3HPREGVALUES EFLAGS : 00000202


3HPNATIVESTACK Native Stack of "Thread-0" PID 28909

3HPSTACKLINE sysMonitorWait at 4041FEB4 in libhpi.so
3HPSTACKLINE lkMonitorEnter at 402FA406 in libjvm.so
3HPSTACKLINE mmipInvokeSynchronizedJavaMethodWithCatch at 4037D619 in libjvm.so
3HPSTACKLINE L0_invokevirtual_quick__ at 40355CD7 in libjvm.so


3XMTHREADINFO "main" (TID:1006E1B8, sys_thread_t:8078370, state:CW, native ID:2000) prio=5

3HPREGISTERS Register Values

3HPREGVALUES EAX : 00000001, EBX : 08078668, ECX : 00000000
3HPREGVALUES EDX : 08078370, ESI : 08078730, EDI : BFFEC334
3HPREGVALUES EBP : BFFEC344, ESP : BFFEC22C, EIP : 4041FEB4
3HPREGVALUES EFLAGS : 00000202


3HPNATIVESTACK Native Stack of "main" PID 28897

3HPSTACKLINE sysMonitorWait at 4041FEB4 in libhpi.so
3HPSTACKLINE lkMonitorEnter at 402FA406 in libjvm.so
3HPSTACKLINE mmipInvokeSynchronizedJavaMethod at 4037D370 in libjvm.so
3HPSTACKLINE L0_invokevirtual_quick__ at 40355CD7 in libjvm.so
3HPSTACKLINE L0_invokenonvirtual_quick__ at 40355DE5 in libjvm.so
3HPSTACKLINE L0_invokevirtual_quick__ at 40355CD7 in libjvm.so
3HPSTACKLINE L0_invokevirtual_quick__ at 40355CD7 in libjvm.so
3HPSTACKLINE mmipExecuteJava at 40350420 in libjvm.so

As seen in the preceding case, the Java stacks are not seen in the Javacore for the threads involved in the deadlock.

Cause

The scenario of no Java stacks being printed mainly arises due to 2 reasons:

  • This is typical of the IA 32 platform.

  • The Java stack generation happens asynchronously. During this mode of generation JVM checks if the current methodblock or compiled code can be retrieved without over loading the VM. If it is possible then the VM prints the stack else it would ignore. The Java stack on IA 32 is only a place holder and contains only dummy and GLUE frames which are created by the JVM. The rest of the frames (i.e. the actual frames for the java and native methods) are present on the native stack which are created by the OS. So when JIT traverses the frames, it may not find a particular field at a particular offset as the frame and stack layouts are different.

    The JIT frame traversal code is unable to identify the top most frame properly due to which is returns a null method name. Because of this the methods are not printed. The reason being that since the top most frame itself was not identified properly, JVM would not be able to traverse the rest of the stack properly and hence we would not get the Java stacks.

    Also this problem is not to be seen only during a deadlock and can happen for other threads as well, on the IA 32 platform if the above condition happens. Based on the above explanation, this is a limitation of the current frame traversal code. If we attempt to remove this limitation, we would be risking overloading the JVM and can lead to unsafe behavior which is more hazardous than the current scenario. So this would remain as a permanent limitation of the current frame traversal code.

    The scenario of not getting the Java stacks of threads in the Javacore on IA 32 is a limitation of the current frame traversal code. Alternative methods would need to used to get the Java stacks of the threads involved in the deadlock or the hang.

Environment

IBM JDK 1.4.2 on IA 32 platform

Resolving The Problem


To resolve and overcome this problem, one of the following alternative methodologies needs to be used to determine the Java stacks of the concerned threads.

> Using jformat:

Customer can be requested to collect a system core file and the SDFF file can be generated by running the jextract command on it. (for information related on how to run jextract, please refer to the 142 Diagnostic Guide : http://www-128.ibm.com/developerworks/java/jdk/diagnosis/142.html).

Once the SDFF is generated, run jformat as :

JAVA_HOME/bin/jformat -J-Xmx2000M corefile.sdff

Once the jformat session is up, run the below sequence of commands:

a> dis os
b> dis js(*) -> This will provide the Java stacks of all the threads (along with the thread names)

Map the thread name from the Javacore output to the output of the "dis js" output to determine the threads involved in the hang or the deadlock.

If the Javacore is not present, the command "deadlock" can be run on the SDFF in jformat after running the "dis os" command. This will check and inform if there is a deadlock and if so, will provide the names of the threads involved in the deadlock. Once this is available, the above steps can be used to determine the Java stacks.

Example:
*******

Ready......
deadlock

......command executing


Deadlock(s) detected !!!


==============================

Thread 0x0x2000 "main"
is waiting to be notified for:
(0x1006ddd0) "main/parentcl"
which is owned by:

Thread 0x0x1600b "Thread-0"
which is waiting to be notified for:
(0x1006dd50) "main/childcl"
which is owned by:

Thread 0x0x2000 "main"

==============================




Ready......
dis proc

......command executing

Process Information
===================
Architecture: 32 bit - Little Endian


AddressSpace: 0 Process: 30188
Signal : SIGSEGV ............

Thread: 0x2000 ExecEnv: 0x08078194 Thread name: main
Thread: 0x1600b ExecEnv: 0x0831c10c Thread name: Thread-0
Thread: 0x10008 ExecEnv: 0x0816ead4 Thread name: GC Helper 3
Thread: 0xe007 ExecEnv: 0x0816d564 Thread name: GC Helper 2
Thread: 0xc006 ExecEnv: 0x0816bff4 Thread name: GC Helper 1
Thread: 0xa005 ExecEnv: 0x0816956c Thread name: Finalizer
Thread: 0x8002 ExecEnv: 0x081671ec Thread name: Reference Handler
Thread: 0x6004 ExecEnv: 0x08164bdc Thread name: Signal dispatcher



Ready......
set thread=1600b

......command executing
Trying to change thread from "2000" to "1600b"
Changed thread to "1600b"


Ready......
dis js

......command executing

Java stack for thread 1600b - Thread-0
==========================

at main.childcl.loadClass(test.java:61)
at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
at java.security.Security.createAlgInstance(Security.java:1370)
at java.security.Security.createAlgInstance(Security.java:1325)
at java.security.Security.getImpl(Security.java:1216)
at java.security.MessageDigest.getInstance(MessageDigest.java:135)
at sun.security.util.ManifestEntryVerifier.setEntry(ManifestEntryVerifier.java:148)
at java.util.jar.JarVerifier.beginEntry(JarVerifier.java:170)
at java.util.jar.JarVerifier$VerifierStream.<init>(JarVerifier.java:383)
at java.util.jar.JarFile.getInputStream(JarFile.java:451)
at sun.misc.URLClassPath$5.getInputStream(URLClassPath.java:683)
at sun.misc.Resource.getBytes(Resource.java:75)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
at java.net.URLClassLoader.access$500(URLClassLoader.java:110)
at java.net.URLClassLoader$ClassFinder.run(URLClassLoader.java:845)
at java.security.AccessController.doPrivileged1(Native Method)
at java.security.AccessController.doPrivileged(AccessController.java:389)
at java.net.URLClassLoader.findClass(URLClassLoader.java:372)
at java.lang.ClassLoader.loadClass(ClassLoader.java:570)
at main.parentcl.loadClass(test.java:73)
at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
at main.test$1.run(test.java:30)




Ready......
set thread=2000

......command executing
Trying to change thread from "1600b" to "2000"
Changed thread to "2000"



Ready......
dis js

......command executing

Java stack for thread 2000 - main
==========================

at main.parentcl.loadClass(test.java)
at java.lang.ClassLoader.loadClass(ClassLoader.java:561)
at main.childcl.loadClass(test.java:58)
at java.lang.ClassLoader.loadClass(ClassLoader.java:502)
at main.test.main(test.java:37)




2> Using DTFJ/DumpAnalyzer:

The DTFJ/DumpAnalyzer can be run against the SDFF. This will provide the java stacks of all the threads present in the Java stack. Once this is got, the Java stacks of the required threads can be got by mapping the thread names from the Javacore to the thread names in the DumpAnalyzer output.

The DumpAnalyzer can be run via ISA (IBM Support Assistant) or through ECUREP.


NOTE: From the PMRs that have reported the problem, we see that with Linux kernel 2.4, other tools like jformat,DTFJ and lcore sometimes do not give the stack trace as well. This can happen while running on 2.6 kernel if LD_ASSUME_KERNEL has been used to set it to 2.4. Hence a possible alternative would be to check with the customer if they can remove the environment setting and check if running on 2.6 kernel provides the Java stack traces with the other tools.

[{"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Debugging Options","Platform":[{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"1.4.2","Edition":"J2SE","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
15 June 2018

UID

swg21304911