This is a very cool and easy way to debug a problem that is happening in your environment. This is particular useful with highly complex application servers, that are doing and connecting to a lot of different stuff under the hood. I’ve used this method dozens of times to solve problems, where WebSphere was trying to open a file that didn’t exist, and was causing major performance problems for the application underneath. Very cool stuff.
By running perfpmr (IBM performance diagnostics tool) it gathers about 500MB of information.
This will spit out a ton of data that you cannot read, and some you can, because it is compiled in a RAW format.
curt -i trace.raw-0
This will read the compiled raw data. In the application summary, halfway through the file, you will see what PID (and it’s TID – which is what we’re interested in), is containing the highest amount of system CPU time. In this case it looks like:
100.7497 6.6511 94.0986 49.3480 3.2578 46.0903 (667738 4042975)
^ PID ^ TID
Once we have located the PID n TID that is hogging up our system CPU cycles, we need to check out the interval traces which are also compiled and unreadable.
This will output a file called trace.int.
Inside you’ll see exactly what the application was doing. What files it was trying to open etc.
Find out what PID you want to debug (one that’s hung).
Once you have that PID, type: dbx –a THAT_PID
This will bring you into the dbx command line.
Type in thread, you’ll see something similar to:
>$t1 run blocked 417997 k no pro _event_sleep
$t2 wait 0xf10002000101ca08 running 401633 k no pro _ptrgl
$t3 wait 0xf10002000101a208 running 438487 k no pro _ptrgl
If the PID has threaded, it will give you a list of TID’s (Thread ID’s).
You’ll want to look for threads that look abnormal I.E that are in a abnormal state, or in a WAIT or DEADLOCK condition.
To examine a thread, type: thread current and then the number right in front of the $t.
E.G: thread current 2
This will really do nothing, it just sets the attention to that thread.
Next type in x this will give you a thread dump, and you can sometimes see where it is hanging.
Next type in where this can also give you an idea of where it is hanging.
When you are done, type detach to exit out of the dbx session. If you type quit it will exit you out of the session but also kill the original PID.
I have used this to solve a lot of problems where things are hanging. Half of the time it’ll tell me EXACTLY where it is hanging. The one time I did this I found out the process was trying to read a file it didn’t have permission to and two threads were in a dead lock because of it.