WAIT: A great, new (and free!) tool from IBM Research for helping pinpoint performance issues
kgibm 0600027VAP Comments (2) Visits (3235)
IBM Research recently made its WAIT tool (Whole-system Analysis of Idle Time) available to the public: http
The WAIT tool takes javacores and operating system statistics snapshots for a period of time as input, and produces a rich webpage as output that visualizes this data. Since WAIT only uses javacores and operating system scripts, you don't need to install anything or even restart the Java processes to use WAIT. For administrators familiar with the performance, hang, and high CPU MustGathers, the WAIT data collector scripts (Note: you must sign up for a free WAIT account to be able to access the collector scripts and use WAIT) are very similar to the collector scripts in those MustGathers. In fact, you can upload the results of those MustGathers to WAIT and it will basically understand them. You can even zip up a set of javacores, the core artifacts for WAIT, and upload the zip, although you won't have the full analysis such as processor utilization.
There is a gallery of examples to demonstrate WAIT: http
WAIT supports both IBM and HotSpot JVMs, although more information is available in the former.
Important note: There is a known defect in the IBM JVM, IZ86722, which can cause a full hang of a JVM when requesting a javacore (the symptom is a truncated javacore in certain sections). This impacts IBM Java 5 < SR12-FP3 and IBM Java 6 < SR9. Therefore, I recommend that WAIT is only used for long captures with WAS >= 188.8.131.52 and WAS >= 184.108.40.206 (htt
What I find most useful is the way that WAIT logically groups thread stacks. For example, in the first WAIT showcase: http
Next, in the category breakdown, WAIT groups stacks both by state as well as by what it's actually doing (through some very clever analysis):
In summary, WAIT analyzes javacore snapshots using visualization and clever stack grouping to help you isolate problems such as high processor utilization, low throughput, network issues, and backend slowness. Give it a shot.