Skip to main content

skip to main content

developerWorks  >  Linux  >

Five easy-to-use performance tools for Linux on PowerPC

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Introductory

Katie Mata (krenfro@us.ibm.com), eServer Solution Enablement, IBM 
Joseph Pu (joepu@us.ibm.com), AIX Technical Consultant, IBM 
Jimmy DeWitt (jdewitt@us.ibm.com), WBI Performance Tools, IBM 

20 Jun 2004

This article describes Performance Inspector, which contains a suite of performance tools for Linux. The author describes how to download and install the required software and tools, and how to collect performance data. She also provides details on how to use the five basic tools, which you can use to analyze performance of your C/C++ and Java applications, as well as performance of your system as a whole.

Performance Inspector

The Performance Inspector package contains a suite of performance tools for Linux. The tools can be used to analyze performance of your C/C++ and Java applications, as well as performance of your system as a whole. The five easy-to-use tools covered in this article are:

  • Timer Profiler (Tprof)
  • CPU utilization (AI)
  • Per thread time (PTT)
  • Java lock monitor (JLM)
  • Java heap dump (heap dump)
Performance Inspector consists of Tprof, AI, PTT, JLM, post, jprof, a2n, swtrace, and heap dump.

Platforms

Currently, Performance Inspector is available for SUSE United Linux 1.0 distributions on the following machine architectures: i386, ia64, x86_64, ppc, ppc64, 390x, and s390. Performance Inspector is available for additional Linux distributions on selected machine architectures. See the Performance Inspector site for current details.

For the PowerPC machine architectures, ppc and ppc64, Performance Inspector is only supported on the 2.4.19 kernel, which is available on the SUSE Linux Enterprise Server 8 (SLES 8) distribution. The code is being ported to the 2.6 kernel, so you should check the Performance Inspector site for the most up-to-date information.



Back to top


Collecting performance data

The Performance Inspector package contains a kernel patch file and a driver patch file. Performance Inspector is installed by applying the kernel and driver patches and rebuilding the kernel, then booting the patched kernel. The kernel patch is a hardware-specific patch that inserts trace hooks into kernel routines such as schedule, dispatch, interrupt, fork, exec, timer, and vma. The driver patch contains the Performance Inspector device driver code. The kernel trace hooks call functions in the device driver code, which log the information provided by the kernel trace hooks in pinned buffers. The Performance Inspector tools, such as swtrace, interact with the driver to make use of the information stored in the pinned buffers. There is one pinned buffer per CPU.



Back to top


Downloading and installing Performance Inspector

You can download the Performance Inspector package from developerWorks Performance Inspector. From the Performance Inspector home page, click Downloads, then click on the link to accept the license. Click on common package. The common package is currently named pi.package.2004-04-14.tar.gz.

Installing the package

Complete instructions for installing Performance Inspector are at Performance Inspector, immediately following the package download link.

The instructions below are equivalent to those on the Web site, but include specific pointers for PowerPC.
  1. You must be the root user. To become the root user, enter the command # su –
  2. Untar the Performance Inspector package on your Linux machine using the command:
    # tar -xzf pi.package.2004-04-14.tar.gz

    If the name of the package has been updated, substitute the new package name instead.
  3. Untarring the package will result in two files, piinstall and pi_tools.tar.gz. Execute the script # ./piinstall

    Running this script will create a directory, which by default is /piperf, and will untar the pi_tools.tar.gz file to that directory and its subdirectories. If you do not wish to accept the default value of /piperf, then the script will let you specify your own directory.

Next, you will need to install Java if you want to use the Performance Inspector Java tools. If you don't want to use the Java tools, the next step is to patch, build, and boot the kernel.



Back to top


Installing Java

Two of the five tools, JLM and heap dump, work with the IBM 32-bit and 64-bit Runtime Environment for Linux v1.4.0 and more recent versions. The JLM tool will not work with non-IBM JVMs because this tool uses IBM extensions to the JVM Profiler Interface (JVMPI). The heap dump tool will work with other non-IBM JVMs because it just uses the standard JVMPI events.

The IBM 32-bit and 64-bit Runtime Environment for Linux is available as part of the IBM Developer Kit for Java. The 64-bit IBM Developer Kit for Java is no longer supported on SLES 8, so you will need to use the 32-bit IBM Developer Kit for Java. Use the following steps to download and install the IBM Developer Kit for Java.

  1. Download the IBM Developer Kit for Java, called the SDK. Click Download on the right side of the page. Select IBM SDK for 32-bit iSeries/pSeries to download the 32-bit SDK. There may be several versions available, so unless you need a specific version, you will probably want to download the latest version. Currently the latest version is 1.4.1 SR2. Once you select the version, select Continue and you'll be asked for registration information. Once you supply this information, you'll be directed to a Downloads page similar to Figure 1.
    Figure 1. Download

    You can download the SDK as either a .tgz or an .rpm package. The easiest way to install the SDK is to download the .rpm package. To do this, look in the box labeled SDK for a package named IBMJava2-SDK-ppc-1.4.1-2.0.ppc.rpm or something similar, depending on the version you chose. Click on Accept License to the right of the package name.

  2. Once you've downloaded the .rpm package, you can install it by running the following command, substituting your package name if necessary for the package name supplied here:
    # rpm –i IBMJava2-SDK-ppc-1.4.1-2.0.ppc.rpm

  3. Once you've installed the SDK, set the PATH and JDKDIR environment variables with the correct information. The following are just examples; the paths may be different, depending on which version of the SDK you installed:
    # export PATH=/opt/IBMJava2-ppc-141/jre/bin:
    /opt/IBMJava2-ppc-141/bin:$PATH
    # export JDKDIR=/opt/IBMJava2-ppc-141

    If you are using an older, supported 64-bit SDK you will also need to set the following variable:
    # export JAVA64=64

The next step is to patch, build, and boot the kernel.



Back to top


Patching, building, and booting the kernel

Follow the steps in this section to patch, build, and boot the SLES 8 kernel so it runs the Performance Inspector code required by the Performance Inspector tools.

If you have not built and booted an SLES 8 kernel before, first make sure you can build and boot a "stock kernel" using unpatched, unmodified SLES 8 kernel code. That way, if you have problems building or booting the kernel with the Performance Inspector patches, you will know whether your problems are due to the Performance Inspector patches or to an error in the build and boot process.

To build and boot a stock kernel, follow the instructions below. Skip instructions 4-8, since they specify how to apply the Performance Inspector patches. When you do step 9, run the make xconfig command as suggested, then in the window that appears, immediately click on Save and Exit. Omit the instruction in step 9 to click on the Performance Inspector button; until you apply the patches, Performance Inspector will not even appear in the window. Other than those exceptions, building and booting a stock kernel is identical to building and booting a patched kernel.

Once you can successfully build and boot a stock kernel, do these steps again to build the kernel with the Performance Inspector patches, but without omitting any steps.

  1. You need to install the kernel source from the SLES 8 base installation CD, if you have not already. The name of the kernel source package is kernel-source. Make sure you install the source from the base CDs instead of the service pack CDs. The Performance Inspector patches currently only work on the 2.4.19 kernel. The service pack CDs contain the kernel source for later versions of the kernel, which will not work with the patches. Figure 2 shows the kernel-source package selected for installation in the yast2 tool.
    Figure 2. Kernel-source package

  2. You also need to install a 64-bit cross-compiler if one is not installed. This compiler is necessary to build a 64-bit kernel. You must build the kernel in 64-bit mode since the 32-bit kernel is not supported on SLES 8 for PowerPC. The name of this package is cross-ppc64-gcc. You can install the compiler package from either the base or the service pack CDs.

    Once you have installed this package, make sure that /opt/cross/bin is part of your $PATH environment variable so the tools in the package will be accessible during the kernel build process. Figure 3 shows the cross-ppc64-gcc package selected for installation in the yast2 tool.


    Figure 3. Package selected for installation

  3. Verify that the /usr/src/linux link points to the 2.4.19 kernel source (the -1 flag refers to the letter l, not the digit 1):
    # ls –l /usr/src/linux

    You should see output similar to:
    lrwxrwxrwx    1 root     root     17 May 19 09:56 
       /usr/src/linux -> linux-2.4.19.SuSE

    Make a back-up copy of this kernel source tree using the command:

    # cp -r /usr/src/linux-2.4.19.SuSE /usr/src/linux-2.4.19.SuSE.bak

    To avoid errors, build from a clean version of the kernel source. If you've previously tried to build from the directory that the /usr/src/linux link currently points to, redirect the /usr/src/linux link to point to a clean kernel source directory, such as your back-up directory:

    # rm /usr/src/linux
    # ln -s /usr/src/linux-2.4.19.SuSE.bak /usr/src/linux

    If your kernel source is not clean and you do not have a back-up directory available, you can uninstall the kernel-source package then reinstall it. After reinstalling, verify that the /usr/src/linux link points to the clean kernel source tree that you just reinstalled, then continue to the next step.

  4. Make sure you are in the /usr/src/linux directory, so you can install the kernel and driver patches:
    # cd /usr/src/linux

  5. Install the kernel patch (the 1 following strip is the digit 1 not the letter l):
    # patch --strip 1 -–verbose
    </piperf/patches/ppc64/pi.SuSE.2419.patch  
      	   >/tmp/patch.log

  6. Run the following grep command to make sure that the patch installed successfully. This command should return nothing:
    # grep –i fail /tmp/patch.log

  7. Install the driver patch. The name of the driver patch might have changed, so you may want to check the /piperf/patches directory to find the most recent pi.driver patch (the 1 following strip is the digit 1 not the letter l):
    # patch --strip 1 --verbose  
    </piperf/patches/pi.driver.2004-04-01.patch
       >/tmp/driver.log

  8. Run the following grep command to make sure that the patch installed successfully. This command should return nothing:
    # grep –i fail /tmp/driver.log

  9. You will need to create a .config file to be used during the kernel build process. To create a .config, first make sure that you are in the /usr/src/linux directory using # cd /usr/src/linux, then run the # make xconfig command.

    The make xconfig command requires Xwindows. If you're not running Xwindows on your Linux machine, you will first need to export the display to a machine that is running Xwindows by entering the # xhost + command on the machine running Xwindows. Then on the machine running Linux, enter the following command, substituting the name of the machine running Xwindows for minotaur.austin.ibm.com:

    # export DISPLAY=minotaur.austin.ibm.com:0.0

    When you run the make xconfig command, a window will appear similar to Figure 4.


    Figure 4. Kernel configuration

    If the Performance Inspector patches were installed correctly, there should be a Performance Inspector button in this window. If you don't see it, make sure you have entered the commands to apply the patches exactly as shown above. Sometimes, if you enter the patch commands incorrectly, no "fail" messages appear in the log files, but the patches do not get applied. You can re-run the patch commands if you mistyped them, then run make xconfig again. Now click on Performance Inspector, and you should see the window shown in Figure 5.


    Figure 5. Performance Inspector

    Select m for module then click Main Menu. In the original window click Save and Exit.

    Once you've exited from the make xconfig command, a .config file should have been created for you in /usr/src/linux. You can verify that the correct Performance Inspector flag has been placed in this .config file by searching for the line CONFIG_PERFORMANCE_INSPECTOR=m

    You could have selected y instead of m in the Performance Inspector window. The y option builds the Performance Inspector code directly into the kernel. It is recommended that you use the m value to build Performance Inspector as a module. This makes it easier to upgrade to a newer level of the tools package, as you would not have to rebuild and reboot the kernel.

  10. Now you will need to build the kernel. From the /usr/src/linux directory, run the following command:
    # make dep CROSS_COMPILE=powerpc64-linux- SUBARCH=ppc64
    ARCH=ppc64 zImage modules modules_install

  11. Copy the new kernel to the boot directory:
    # cp arch/ppc64/boot/zImage  /boot/piperf.zImage

  12. To update the boot loader, add the following stanza to /etc/yaboot.conf:
    image=/boot/piperf.zImage 
    label=piperf.zImage
    root=/dev/sda3

    The root field indicates which disk partition you want to use as the root partition. The value assigned to this field will depend on your particular configuration, so the value shown above, /dev/sda3, may be incorrect. To determine the correct value to assign to the root field, you can run the # df / command.

  13. Reboot using # shutdown –r now
  14. When yaboot comes up, you can press the Tab key to see all of the available kernels listed. Your new kernel should be on that list. Enter the name of your kernel and press Enter. The new kernel should now boot.

The next step is to build the Performance Inspector tools.



Back to top


Building the Performance Inspector tools

If you want to use the Performance Inspector Java tools, make sure you've installed Java and set the following PATH and JDKDIR environment variables with the correct information. If these variables are not set, the Java tools will not be built. The following are just examples; the paths may be different, depending on which version of the SDK you installed:

# export PATH=/opt/IBMJava2-ppc-141/jre/bin:
/opt/IBMJava2-ppc-141/bin:$PATH
# export JDKDIR=/opt/IBMJava2-ppc-141

If you are using an older, supported 64-bit SDK you will also need to set the variable # export JAVA64=64

Now you will need to run the tinstall script to build the Performance Inspector tools:

# cd /piperf/bin
# ./tinstall

Configuring Performance Inspector

You need to configure Performance Inspector every time you log in by sourcing the setrunenv script:

# cd /piperf/bin
# . setrunenv

Now the Performance Inspector tools are ready to use. The following sections explain how to use them.



Back to top


Tprof

The Tprof tool is a timer profiler that identifies what code is running on the CPU during a user-specified time interval. It is often used to help diagnose any hot-spots in CPU usage. While it is running, it records the address of the instruction that is being executed every time a system-clock interrupt occurs. The interrupts occur 100 times a second per processor on most systems. When the user-specified interval is over, Tprof groups the instructions together by process, thread, module, and subroutine. Then it generates a report that lists how many "ticks" each of these units of code received (how many times a system-clock interrupt occurred when that particular unit of code was running). The Tprof tool provides this information for multiple types of code, including application code, library routines, and kernel code.

How does Tprof work?

The Tprof tool is really a shell script called run.tprof that runs the swtrace tool, which gathers trace data from the trace hooks in the kernel. When Tprof calls swtrace, it indicates which specific trace hooks it needs. The swtrace tool then provides the addresses of each instruction that was running when a system-clock interrupt occurred, and the process id and thread id corresponding to each instruction, so that ticks can later be assigned by process id and thread id. This information is stored in trace buffers that have been allocated for each processor. The default size for each trace buffer is five Megabytes, but this can be overridden when you run the Tprof tool. Address to symbol name mapping is then performed by the a2n shared library, with the help of the jprof tool for Java code. The address to symbol name mapping allows Tprof to assign ticks by module and subroutine. The post tool is then used to produce the final Tprof report. The swtrace, a2n library, jprof, and post tools are all installed as part of the Performance Inspector installation.

Can Tprof profile what code is running at particular events (other than system-clock interrupts)?

Instead of recording which code is running at each system-clock interrupt, some versions of Tprof can record which code is running every time a certain event occurs, such as a cache miss or a branch taken. This feature is available on the x86 architecture only, and is not currently available for PowerPC.

Does Tprof measure CPU time or wall-clock time?

Because Tprof records what code is running on the CPU at each system-clock interrupt, you could say that it roughly measures the amount of CPU time that each unit of code, such as process or subroutine, consumes. It does not measure wall-clock time. It just lists the total amount of ticks received over the interval that it ran; it does not list at what points during the interval these ticks occurred. When code disables interrupts, the ticks will not be attributed to the code until the point where the interrupts are re-enabled.

How do I run Tprof?

Assuming you installed the Performance Inspector tool suite in the default location, the /piperf directory, the script that runs Tprof can be found in /piperf/bin. This script can be invoked using run.tprof [buffer_size]. On PowerPC this script takes one optional parameter, buffer_size, which is specified in Megabytes. This refers to the size of the trace buffers that are created for each processor to store the trace information that is generated while Tprof is running. If you omit this parameter, Tprof will use the default buffer size of five Megabytes.

Once you run the run.tprof script, Tprof will ask you to press Enter when you're ready to start the trace. This gives you a chance to start all programs that you wish to include in the trace. Once you press Enter, Tprof will continue to collect data until you stop the trace by pressing Enter again, or until your trace buffers are full. When you stop the trace, a Tprof report called Tprof.out is generated in your current working directory. A log file called run.tprof.log, which contains run information, is also created in the same directory.

Example 1:

# /piperf/bin/run.tprof

Example 2:

# /piperf/bin/run.tprof 10

How do I interpret a Tprof report?

The Tprof.out report consists of a header and the Tprof report summary. The header of a Tprof.out file looks like the following:

 
Tprof Reports

     ProcessorSpeed      1451000000
     TraceCycles         2744696387
     TraceTime           15.123(sec)

TOTAL TICKS            3026
(Clipping Level :      0.0 %    0 Ticks)

The ProcessorSpeed is the processor speed given in Hertz. The TraceCycles is the total number of CPU cycles that were executed during the trace. The TraceTime of course is the total trace interval. TOTAL TICKS is the number of times that the system-clock interrupt occurred during the trace interval. The Clipping Level is the bottom x% of ticks that were clipped, or omitted from the report. The clipping level is used to limit the amount of data that appears in the report for code that received few ticks.

The next section you'll see is the Tprof report summary. The data that Tprof collects is the number of ticks received per instruction, but this data can be presented in different ways. The first section in the report summary shows the amount of ticks assigned by process:

================================
  )) Process
================================
LAB     TKS       %%%      NAMES

PID    1612      53.27     SystemProcess_0000
PID     727      24.03     java_06a3
PID     687      22.70     in.telnetd_0438

The Process section shows how many ticks have been assigned to each process that ran during the interval. The names of the processes are listed in the right column labeled NAMES. The TKS and %%% columns show the total number of ticks that each process received, and the percentage of ticks the process received, respectively. The LAB column indicates what type of code unit this is. LAB is short for LABEL. In this section, all code units will be PID or processes. The other types of code units, which you will see in later sections, are TID for kernel thread, MOD for module, and SYM for symbol (subroutine name).

The other sections in the Tprof report summary are a little more complicated, but are just variations on this theme. Below is an excerpt from the Process_Thread_Module_Symbol section:

================================
  )) Process_Thread_Module_Symbol
================================

LAB   TKS   %%%     NAMES

PID   1612 53.27    SystemProcess_0000
    TID   1612  53.27     tid_0000
       MOD   1611  53.24      vmlinux
          SYM   1607  53.11       .idled
          SYM      4    0.13      .schedule

       MOD      1   0.03      NoModule
          SYM      1    0.03       NoSymbols

The Process_Thread_Module_Symbol section lists each process from the Process section. This excerpt only shows the first process, since listing the other processes would take pages! The statistics for SystemProcess_0000 are the same as in the Process section. The type of code unit is PID, the number of ticks is 1612, and the percentage of total ticks used is 53.27.

This section differs from the Process section in that it subdivides the process code into smaller code units. The code that runs in a process can be subdivided into threads, which can be subdivided into modules, which can be subdivided into individual subroutines. In the above data, SystemProcess_0000 runs code from one kernel thread, tid_0000. Performance Inspector will support the Native POSIX Threading Library (NPTL) in later releases, but currently you can assume that TID refers to kernel thread.

This tid_0000 thread in turn ran code from two modules. The first module the thread ran is vmlinux, or kernel code. The subroutines that ran in the kernel code are idled (the process that is run when the CPU is idle) and schedule (the kernel scheduler). The second module listed for this process is NoModule. Examples of code that would be considered NoModule are device drivers or Java code that was executed without the Java profiler, so that Tprof could not perform the mapping from instruction address to module and subroutine name. The other sections in the Tprof report summary are similar to this one, so once you understand this section, interpreting the rest of the report should be straightforward.



Back to top


Above idle (AI)

The AI tool displays CPU utilization statistics during a user-specified interval. It displays three statistics to describe CPU utilization:

  • Percentage of time the CPU is idle
  • Percentage of time the CPU is busy
  • Percentage of time the CPU is handling interrupts
If you have multiple CPUs, it will provide this information per CPU and the system-wide average of all CPUs.

The AI command is actually a specialized feature provided by swtrace, the software tracing tool that is installed as part of the Performance Inspector tool suite. At the requested interval, swtrace will make an ioctl( ) call to the Performance Inspector device driver to retrieve idle information.

Does the percentage of time the CPU is busy include the percentage of time the CPU is handing interrupts?

No, the percentage of time that the CPU spends handling interrupts is not a subset of the total busy time for that CPU. The three statistics that AI generates refer to three separate buckets of time.

How do I run AI?

Assuming you installed the Performance Inspector tool suite in the default location, the /piperf directory, then the AI tool can be found in: /piperf/bin. You can run the AI tool by running the command:

swtrace ai [sample_interval [num_samples]]

The sample_interval and num_samples parameters are optional. The sample_interval parameter, specified in seconds, is how often the CPU utilization should be sampled and displayed. The num_samples interval specifies how many samples should be taken. If the sample_interval parameter is omitted, the default sample interval is one second. If the num_samples parameter is omitted, the samples run forever or until you do Ctrl-c or Ctrl-Break.

Example 1:

# /piperf/bin/swtrace ai

Example 2:

# /piperf/bin/swtrace ai 5

Example 3:

# /piperf/bin/swtrace ai 2 3

How do I interpret the output of the AI command?

On a system with 2 CPUs, the output will look similar to the following:

Percentage by processor:
   CPU 0  IDLE=  0.00, BUSY=100.00, INTR=  0.00
   CPU 1  IDLE=  0.00, BUSY=100.00, INTR=  0.00

Percentage by System:
          IDLE=  0.00, BUSY=100.00, INTR=  0.00

The "Percentage by processor" lists statistics for each CPU on the system. For each CPU, it shows the percentage of time the CPU was idle, the percentage of time the CPU was busy, and the percentage of time the CPU spent handling interrupts. The "Percentage by System" lists the same statistics, but instead of breaking them down CPU by CPU, it shows the system-wide average across all CPUs.



Back to top


PTT

The PTT tool collects statistics on all processes that run during a user-specified interval. At this time, PTT does not support the Native POSIX Threading Library (NPTL), so the term thread is synonymous with process. PTT collects the following statistics for each process:

  • Process name
  • Process id
  • Number of CPU cycles that the process used
  • Percentage of total CPU cycles that the process used
  • Number of interrupts that occurred while the process was running
  • Number of times that the process was dispatched (or how many times the kernel dispatch subroutine relinquished the CPU to that process)

The PTT tool can be invoked as a script called run.ptt. This script calls the command-line tools ptt and pttstat to collect the necessary statistics.

What about the PTT APIs?

In addition to the command-line tools ptt and pttstat, and the run.ptt script that invokes them, there is also a set of PTT APIs you can use to instrument your Java and C applications. Currently for PowerPC these APIs provide the number of time-base ticks that have occurred between successive API calls in an instrumented program. See Linux Per Thread Time APIs for information on how to implement the PTT APIs.

How do I run PTT?

Assuming you installed the Performance Inspector tool suite in the default location, the /piperf directory, then the PTT tool can be found in /piperf/bin. The PTT tool can be invoked by running the run.ptt script. This script does not take any parameters. When you run this script, the tool immediately starts collecting statistics. When you want to stop collecting statistics, press Enter and a report will be generated. For example,

/piperf/bin/run.ptt

How do I interpret the output of PTT?

Once you press the Enter key, you will see statistics similar to the following:

ptt version 2040024
* Opening ptt.out For ptt Output
total cycles     1,461,557,204

DISP    IRQ          Cycles        %       PID     PNM
410      4     1,459,143,201     99.8        0     IDLE
403      0         2,256,862      0.2       15     khvcd
  2      1           126,544      0.0     9227     bash
  4      0            19,189      0.0     8197     in.telnetd
  1      0             4,654      0.0       10     kupdated
  1      0             3,649      0.0        1     init
  2      0             2,593      0.0      774     nscd
  1      0               512      0.0       18     kreiserfsd

Interpreting the statistics is straightforward. The total number of CPU cycles that occurred during the interval that the PTT tool collected statistics is shown at the top, in the row "total cycles." The chart below that lists all the processes that ran during the interval, with the statistics accumulated for each process. The name of the process is on the right side in the column labeled PNM. The process id is just left of that, in the column labeled PID. Just left of that are the Cycles and % columns. The number in the Cycles column lists the number of CPU cycles that the process used during the interval that PTT collected statistics. The % column shows the percentage of CPU cycles that the process used, which of course is calculated by dividing the number in the Cycles column by the number of "total cycles" at the top of the chart. The IRQ column indicates the number of interrupts that occurred while that process was running. Finally, the DISP column indicates the number of times the process was dispatched during the interval that the PTT tool collected statistics.



Back to top


JLM

The JLM tool provides statistics on locks in Java programs. These statistics include how the lock was acquired, the total amount of time the lock was held, and the average amount of time that the lock was held. It works with the IBM 32-bit and 64-bit Runtime Environment for Linux v1.4.0 and more recent versions. The JLM tool will not work with non-IBM JVMs because the tool uses IBM extensions to the JVM Profiler Interface (JVMPI).

The JLM is a specialized function provided by the Java profiler, which is generally referred to as jprof. The Java profiler is implemented as the libjprof.so library on PowerPC. The rtdriver tool, a socket-based command-line application, provides an interface to the Java profiler. Like the other Performance Inspector tools, rtdriver and libjprof.so can be found in /piperf/bin.

How do I run JLM?

  1. First you need to run your Java program. You'll need to use the Xrunjprof:socket flag to start the Java profiler. For example, if your Java class is named MyJava, run the command:
    # java –Xrunjprof:socket MyJava

  2. Run the rtdriver application (the -l flag passed to the following command is the lowercase letter l, not the digit 1):
    # rtdriver –l

  3. Once the rtdriver application connects to the local host, it will display a Command> prompt. Enter the following command at the prompt to instruct the Java profiler to start collecting lock statistics:
    Command> jlmstart

  4. The Java profiler can generate a report on the current lock statistics at any time. Also, you can generate multiple reports during the same rtdriver session. Each report will be stored in the current working directory in the file log_jlm.#_pppp, where # is the sequence number of the report, starting at 1 for the first report, and increasing by one for every subsequent report, and pppp refers to the process id. To generate a report, enter the following command at the Command> prompt:
    Command> jlmdump

  5. When you are finished collecting lock statistics, enter the following command at the Command> prompt:
    Command> jlmstop

  6. To exit rtdriver, enter the following command at the Command> prompt:
    Command> quit

How do I interpret the JLM report?

Below is an excerpt from a log-jlm file:

Java Lock Monitor Report
Version_5.0
Built : (Tue Apr 27 13:57:19 PDT 2004)

Platform : (Linux-ppc64)

JLM_Interval_Time 26896040

System (Registered) Monitors

%MISS GETS NONREC SLOW REC TIER2 TIER3 %UTIL AVER-HTM MON-NAME
 0     4     2     0    2    0     0     0   26898   Monitor Cache lock
 0     2     2     0    0    0     0     1   151640  Thread queue lock

The report consists of three parts, a header followed by two sections: System (Registered) Monitors and Java (Inflated) Monitors. The locks in the System (Registered) Monitors section are the locks taken by the JVM and its components, whereas the locks in the Java (Inflated) Monitors section are the locks taken in user Java code. The Java (Inflated) Monitors section has been omitted from the excerpt above for brevity, but the statistics for the locks in the Java (Inflated) Monitors section can be interpreted exactly the same as the locks in the System (Registered) Monitors section shown.

The header contains information about the JLM tool, including what version of the tool was run, when that tool was built, and what platform the tool was run on. The JLM_Interval_Time field contains the number of time base ticks that occurred between the time that the jlmstart command was executed and the jlmdump command was executed.

Next, the report shows statistics for the System (Registered) Monitors. These statistics require a brief explanation of how locks can be acquired. A lock can be acquired in one of two ways, recursively or non-recursively. If a lock is acquired recursively, the requesting thread already owns it. If a lock is required non-recursively, the requesting thread does not already own it. There are two types of non-recursive locks: fast and slow. A fast lock is acquired immediately, with no waiting. A slow lock is owned by another thread and must be relinquished by that other thread before it can be acquired.

The statistics in the System (Registered) Locks chart can be interpreted as follows. Each row in the chart refers to a different lock. The name of the lock can be found on the right side in the MON-NAME column.

Column Meaning
%MISSPercentage of times that the lock was requested non-recursively but had to be waited on. Can be calculated as %MISS = 100 * (SLOW / NONREC).
GETSNumber of times that the lock was acquired. This can be calculated as GETS = FAST + SLOW + REC
NONRECNumber of times that the lock was acquired non-recursively
SLOWNumber of times that the lock was acquired non-recursively but had to be waited on
RECNumber of times the lock was acquired recursively
TIER2 and TIER3Can be ignored since this information does not apply to PowerPC
%UTILPercentage of time that the lock was held over the interval of time between the jlmstart and jlmdump commands. Can be calculated as 100 * (Hold_Time / JLM_Interval_Time), where Hold_Time is the number of time base ticks that occurred while the lock was held.
AVER-HTMAverage number of time base ticks that the lock was held during each non-recursive acquire



Back to top


Heap dump

The heap dump tool provides statistics about the objects found on the Java heap, including the number of objects of each class, total number of bytes allocated to objects of each class, and average size in bytes of objects of each class. Additional statistics include the total size of the Java heap and the memory addresses where the Java heap begins and ends. It works with the IBM 32-bit and 64-bit Runtime Environment for Linux v1.4.0 and more recent versions. The heap dump tool should also work with other non-IBM JVMs since it uses the standard JVM Profiler Interface (JVMPI) events.

Like the JLM, the heap dump is a specialized function provided by the Java profiler. The rtdriver application is used as an interface to the Java profiler. Through the rtdriver interface, a request can be made to generate a heap dump report at any time. The rtdriver application and the Java profiler, which is implemented as libjprof.so on PowerPC, can be found in /piperf/bin.

How do I run heap dump?

  1. You will need to run your Java program. You will need to use the Xrunjprof:socket,classloadinfo flag to start the Java profiler. For example, if your Java class is named MyJava, run the following command:
    # java –Xrunjprof:socket,classloadinfo MyJava

  2. Run the rtdriver application (the -l flag passed to the following command is the lowercase letter l, not the digit 1):
    # rtdriver –l

  3. Once the rtdriver application connects to the local host, it will display a Command> prompt. You can request a report on current Java heap usage at any time. You can also generate multiple reports during the same rtdriver session. Each heap dump report actually consists of a set of two files that are stored in the current working directory, and are named log_hd.#_pppp and log.hdcnm.#_pppp, where # is the sequence number of the report, starting at 1 for the first report, and increasing by one for every subsequent report, and pppp is the process id. To generate a report, enter the following command at the Command> prompt:
    Command> heapdump=1

  4. To exit rtdriver, enter the following command at the Command> prompt:
    Command> quit

How do I interpret a heap dump report?

The heap dump report consists of two files, a log-hd file and a log-hdcnm file. First examine the log-hd file. The contents should look similar to the following:

 # HEAPDUMP
 # level             1
 # begin             0x8004ea3090
 # end               0x8004ed11db
 # num_traces        5
 # traces            0x7fe05e0cc0
 # HEAPDUMP SIZE  =  188747

The total size of the Java heap is indicated by the HEAPDUMP SIZE label, which in this example is 188747 bytes, given in decimal format. The Java heap extends from memory address indicated by the "begin" label to the memory address given by the "end" label. In this example, the heap extends from 0x8004ea3090 to 0x8004ed11db.

Now examine the log-hdcnm file. Here is an excerpt:

MaxObjectSize : 19088
  
OBJS    BYTES     AVER      %
4765 -539437512 -113208.0  100.0

OBJS    BYTES     AVER     %    CLASS
====    =====    =====  =====   =====
1331   128512     96.0   -0.0   CHAR[]
  24    42160   1756.0   -0.0   BYTE[]
1115    35680     32.0   -0.0   java/lang/String

First you will see the MaxObjectSize field, which refers to the size in bytes of the largest object found on the Java heap. Next are statistics describing all objects found on the Java heap. A summary for all objects on the heap is given first, then the statistics are broken down class by class. Consider the CHAR[] class, for example. There are 1331 objects of class CHAR[] on the heap. The total number of bytes allocated to those CHAR[] objects is 128512, which averages out to 96.0 bytes per CHAR[] object. The number in the % column indicates the percentage of the total Java heap space that the CHAR[] objects are using. The statistics for each of the other classes can be interpreted in the same way.



Resources

  • Performance Inspector is the best source for information. It includes the Performance Inspector packages for download on multiple platforms, and detailed installation and usage instructions.

  • SUSE Linux Enterprise Server 8 provides an overview, list of products and solutions, and a description of services.

  • Visit Tprof for more information about the tool.

  • Go to AI for more details about the tool and swtrace.

  • Read more about the PTT tool.

  • JLM is a repository of information about the tool.

  • Visit JPROF to learn more about heap dump.


About the authors

Katie Mata is an AIX technical consultant in the Solutions Enablement Group in Austin, TX. She has 5 years experience in AIX development. She can be reached at krenfro@us.ibm.com.


Joseph Pu is an AIX technical consultant in the IBM Systems group. His focus is in the area of AIX performance, tuning and sizing. Joe has extensive experience in software development, from graphics to software simulation. He started his AIX development experience more than 10 years ago. Joe was graduated from the University of Texas at Austin, with a degree in Computer Science. He can be reached at joepu@us.ibm.com.


Jimmy DeWitt is a performance tools developer at IBM Austin. He received his B.S. in Computer Science from St. Edward's University. Jimmy has worked for IBM for over twenty years on projects ranging from 3270 emulation, LAN transports, DCE and performance related projects. He is currently part of the WBI Performance group in Austin, Texas where he has worked as a performance analyst and tools developer. You can contact him at jdewitt@us.ibm.com.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top