The IBM Whole-system Analysis of Idle Time (“WAIT”) tool is a web-based tool for diagnosing performance and scalability bottlenecks, especially in deployed enterprise environments, but useful throughout the software life cycle from development to test to deployed customer environments and IBM customer support.
WAIT's value proposition lies in the fact that it can be used with any deployed system running a JVM -- with no restart, no agents, no special flags, no special versions, no kernel patches, etc.
WAIT is lightweight and its output starts at a high level ( e.g. Delayed waiting on database, or Dashboards load, artifacts loading, etc).
WAIT employs an expert rule system to look at how Java code communicates with the wider world to provide a high-level view of system and application bottlenecks.
Finally, WAIT's output is viewed in a standard browser (Firefox, Chrome, Safari, Internet Explorer),so no install is needed.
Collecting WAIT Data:
WAIT data can be collected by running the downloaded datacollector.
For example, the Figure below illustrates a simple use of the WAIT data collector for AIX, Linux, Solaris, and HP-UX. In this
example, the user specifies the process ID (PID) of the Java Virtual Machine (JVM) from which javacores are to be collected. For example, this PID could be for a JVM running the WebSphere Application Server, or any other JVM.
In this example, javacores are collected every 15 seconds. However, the current default is every 30 seconds. When the system is generally performing acceptably, we recommend collecting WAIT data infrequently – once every 20 minutes to provide a baseline of performance with minimal impact and data storage requirements. When problems occur, we recommend increasing the collection rate to every 30 seconds and collecting data for 5-10 minutes at this increased rate.
Once you feel that a “sufficient” number of javacores have been collected, you hit CTLC to stop collection. The script then zips all of the javacores into a single file named waitData.zip. This zip file can then be uploaded to the WAIT server for analysis and display, as described in the next section. The meaning of a “sufficient” number of javacores varies a bit with context. If minimal data and fast turnaround is needed, a single javacore will suffice. More typically, 5 – 40 javacores collected 30 seconds apart are helpful to pinpoint problems. (As noted previously, it is also good if an ongoing, slow collection of data is done with data collection spaced every 20 minutes.)
Many other options are available than just specifying the PID of the JVM. For example, the data collector supports collecting data from all JVMs in the system or collecting N samples and then stopping. Invoking the data collector (waitDataCollector.sh) with no arguments specifies the full list of options, as illustrated in the Figure below:
You must either specify one or more valid PIDs, or use the option: --processName NAME
USAGE: ./waitDataCollector_2013-02-20.sh [options] [PID_1] [PID_2] [... PID_N]
--sleep N: Number of seconds to sleep between javacore samples. Default is 30 seconds
A sleep interval of less than 15 seconds is not recommended.
--iters N: Number of javacores to triggering before exiting.
Leave blank to continue collecting until CTRL-C is pressed.
--javacoreDir DIR: Full path to the directory where javacores are written by the JVM. If not specified, this directory is computed based on the CWD of the JVM PID.
--skipJvmVersionCheck: Do not check the the JVM version and issue warnings for older JVMs.
--continueIfHeapdumpsOccur: Force the wait collector to continue even if heap dumps are occuring
--psInterval N: Modify the number of seconds to sleep between invocations of the ps command.
If unspecified this value is computed based on the sleep interval.
--outputDir DIR: Directory that the WAIT data collector can use to store the collected data. Must be empty, or nonexistant.
--outputZip FILE_PREFIX: The prefix of the tar.gz file produced. (default is waitData.tar.gz)
--noDelete: Do not delete the raw datafiles from /tmp
--noJstack: Do not use jstack for Oracle JVMs even if it is available
--noZip: Do not zip up the output produced by the data collector.
--noJavacoreTriggers: Do not trigger javacores, but still look for and archive them
--processName NAME: Monitor all processes with this name
--mustGather: Run with the behavior of the WAS performance must gather script
As the figure indicates, these options provide the ability to monitor all JVMs or only a subset of interest, and also provide control over naming and location of files and the interval at which javacores are collected and for how long.
Once the data has been collected, it is ready to be uploaded to the WAIT server and a description of that follows shortly.
However, if you prefer not to use WAIT Data Collector scripts, WAIT’s primary input, javacores, can be obtained directly by signaling the JVM. Directions for this manual method are available under the “Quick Start” link in Box 2, “Collect Data From Your System”. For convenience we reproduce, the Quick Start information here:
The WAIT data collector essentially automates these manual steps under the cores. We also note that “kill -3” on the JVM process ID does not kill it, but only causes it to dump a javacore. Under the covers the WAIT data collector also collects data from ps and vmstat, as well as lparstat if available. It is not required that these utilities be available. The quality of the WAIT report is improved if they are available, but the report provides a great deal of useful information even without them.
Upload data to WAIT Server
Having collected data (or just using the sample input from Box 2 of WAIT’s main screen
shown on the cover), we can go to Step 3, and “Upload Generated Data”. Clicking on
“Use WAIT Now” shows the following screen:
Using the “Select a File” button in green, you may select the zip file created using the data collector process described in the previous section. By default, this file will be named waitData.tar.gz. Select this file using the standard Windows or MacOS or other browser explorer mechanism. The optional description field can be used to detail the content of the report and to enter searchable phrases. Your email address is filled in by default and provides a useful key to finding reports from history. With this information (file, optional description, email address – completed by WAIT based on your login), you are ready to click the green “Submit for Analysis” button.
After doing so, the WAIT report will pop up in your browser in a few seconds (although times can vary depending on your connection speed and the number / size of the javacores and other data being uploaded).