Troubleshooting Java on AIX: Data collection for debugging hangs, high CPU, and performance issues

This article gives instructions for troubleshooting Java™ for the IBM® AIX® operating system. It provides short, simple, and complete instructions for collecting data for debugging hangs, slow responsiveness, or looping situations with Java applications running on AIX. By following the instructions in the article, you'll be able to collect the right data and complete the required steps before contacing IBM Support, thereby expediting your call. This article does not provide information for the analysis of any of the data collected, nor does it provide operating system or application recommendations for resolving issues. This article is provided by the IBM eServer UNIX and ISV Technical Support Team for AIX and Java in Austin, Texas.

Roger Leuckie (rog@us.ibm.com), Previous Team Lead for ISV and UNIX Technical Support for Java, IBM, Software Group

Roger Leuckie is team lead for ISV and UNIX Technical Support for Java team located Austin, Texas. You can contact him at "*rog@us.ibm.com.



Dawn Patterson (dpatter@us.ibm.com), Previous ISV and UNIX Technical Support for Java, IBM, Software Group

Dawn Patterson is a former member of ISV and UNIX Technical Support for Java team located in Austin, Texas. You can contact her at *dpatter@us.ibm.com.



Rajeev Palanki (rpalanki@in.ibm.com), Level 3 Java Support, IBM, Software Group

Author photoRajeev Palanki provides Level 3 Java support and is located in India. You can contact him at *rpalanki@in.ibm.com.

*We ask that you do not contact the authors directly for support related calls. Instead you should contact 1-800-IBMSERV to have a service call opened.



29 April 2004

Also available in Russian

Important notice

This article is provided as a convenience to customers. The information contained in this article is provided "as is", with no warranty or support. The contents of this article will be updated on a best effort basis as time permits.

If you need, you can obtain detailed information on supported documentation and tools using the following references:


Having the right expectation

The JVM may be considered hung if the process is still present but is not responsive, or is consuming the majority of the CPU time. Lack of response could be a result of a:

  • Deadlock
  • Timeout
  • Blocked on resource
  • Suspended threads
  • Thread hogs, high CPU utilization, looping
  • Memory constraints

with the JVM (due to JVM native code, GC, JIT, AIX, other native code, and the Java application).

Collecting information might require reproducing the issue several times to collect and eliminate information to identify and resolve the issue. Sometimes only a Javacore may be sufficient; other times it might involve collecting kernel traces, AIX core files, and other information. If the issue cannot be reproduced in a test, lab, or development environment, data may need to be collected from a production system. If data cannot be collected from a system demonstrating the issue, the Support teams might not be able to resolve the issue. We understand that collecting data in production environments can result in down time for the application, which can impact revenue, stability, and customer perception. Support teams will make every effort to keep the number of interations for collecting information to a minimum and try to resolve the issue as soon as possible.

Service notice

See IBM developer kits for AIX, Java technology edition for important information about support of Java on AIX. Note the End Of Service dates on the main download page, and carefully read the terms and conditions for support of Java on AIX provided below the download table.


Javacore versus AIX core

A Javacore file is a text representation of the Java application at an instance in time. The Javacore file is created by the Java process when a user runs the kill -3 command on the Java process, and the process aborts or terminates due to an internal exception. For specific information on the Javacore file, see the IBM JVM Diagnostic Guides.

For some situations, providing a collection of Javacore files generated from the same process can be helpful for understanding how the application is behaving. When a process has become hung or has reached an unstable condition, the process may be unable to create the Javacore file.

An AIX core is a binary representation of the process in memory at some instance in time. From this core, we might get much of the same information as reported in a Javacore, plus additional process information not reported by the Javacore file. For this reason, in most cases, the support teams will request an AIX core file over a Javacore file.

There could be situations that call for both an AIX core and Javacore file to be collected, and it might be necessary to reproduce the issue multiple times.


Setting up the operating system

An AIX core file is created in the current or present working directory from which the Java application was started. This directory might be different from the present working directory of the command prompt if the Java application is started from within a startup script, or launched from another process. Any reference to core in this article is specific to an AIX or UNIX or binary core file, and not a Javacore file unless otherwise specified. The core file provides a snapshot of the process at the time of the issue.

If you're unfamiliar with the commands in this article, see the AIX documentation for additional information and syntax.

When running the commands, replace the italized text with the appropriate value, or you'll get errors and unexpected results.

To enable the AIX operating system for generating full or complete core files, follow these steps:

  1. Set the user process limits to unlimited by running this command as the root user:
    chuser fsize=-1 data=-1 core=-1 user_id_running_application

    where user_id_running_application is the name of the user running the Java application. For example, chuser fsize=-1 data=-1 core=-1 root

    This will change the file size (fsize), core file size (core), and memory size (data) of the user to unlimited. There can be risks associated when using these settings, so once information has been collected, the original settings should be re-enabled. Once the changes have been made, remove references to the ulimit command in login, user profiles, or any startup scripts. You must relogin as the user prior to starting the application. Also, any application started from /etc/inittab file may require a system rebooted because the changes might need a chance to affect the init process. Making these changes does allow the user for which the changes were made to potentially run a process that could consume system resources.

    Verify the change by running the following command prior to starting the Java application:

     ulimit -a
  2. Enable full core dumps for the system by running the following command as root user:
    chdev -l sys0 -a fullcore=true

    This change does not require a system reboot. If you are familiar with the SMIT utility, the setting can be changed by running the command smitty chgsys, then setting the value for Enable full CORE dumps to true.

  3. Ensure that the current working directory has enough disk space available to write the core file. Use the following command to check the free space of the file system:
    df -kP current_working_directory_of_application

    You can redirect AIX core files to an alternate location by creating a symbolic link from the location where the core file will be created to the location where the core file is to be saved. To do this, you must create a symbolic link from the current working directory from which the application is started to an alternate directory that is to contain the core file. This alternate core file does not need to exist prior to starting the application; it will be created automatically. To assign the alternate location, run the following command:

     ln -s alternate_directory/core current_working_directory_of_application/core

    The syscorepath utility, provided with AIX v5.2 or highter, can be used to specify a single system-wide directory where all core files of any processes will be saved. The syntax for this command is:

    syscorepath -p alternate_directory
  4. Verify directory and file ownership, and permissions for core file, by using:
    aclget directory_or_file

    to verify that the user running the application has authorization to write to the destination directory. If there is any doubt, run the following command while logged in as the user running the application:
    touch directory/core

    Use the chmod or the chown commands to modify ownership or permissions, respectively, or run smitty user to modify characteristics of the user account. Modifications to the user account will require a re-login as that user.

  5. Ensure there is adequate disk space for saving the core file. The core file can be as large as the size of process in memory. The RSS (process size) field from the ps command output may be used to provide an approximate size of the core file.
    ps avwwg java_pid

    If you need additional space, free space by deleting unwanted or older files, or increase the size of the destination file system.

  6. A set of AIX commands or utilities will be used to collect information. The following AIX filesets must be installed before continuing:
     File                       Fileset               
     ----------------------------------------------------------
    /usr/bin/vmstat             bos.acct              
    /usr/bin/trace              bos.sysmgt.trace      
    /usr/bin/gennames           bos.perf.tools        
    /usr/bin/tprof              bos.perf.tools
    /usr/bin/uudecode           bos.net.uucp  (used by libsGrabber.sh)
    /usr/bin/syscorepath        bos.rte.control 
    /usr/sbin/snapcore          bos.rte.serv_aid ( also /usr/bin/truss )

    To ensure all filesets are properly installed, run the command:

    lslpp -l fileset_name

    Any missing fileset should be installed from AIX base installation media, then upgraded to the most current level using Fix Central.


Disabling Java signal handling

As discussed in Javacore versus AIX core, Javacore files are not always the best tool for debugging a hang situation. A binary AIX core file might provide more useful information. To get a good AIX core file, the JVM has to be set up so it does not create the Javacore when it receives a signal sent to the process. The following changes must be made in the environment running the application, prior to the application being started.

For situations where the application is started by another process, such as Websphere, setting this environment could impact all Java processes. For these situations, see that application's documentation for enabling enviroment settings specific to the application.

Set the DISABLE_JAVADUMP environment variable by running the command:

export DISABLE_JAVADUMP=true

If you run the command:

 java –fullversion


and the build date located at the end of the output shows a date prior to Jan 2003 (for example, caXXX-2002xxxx ), you need to set the IBM_NOSIGHANDLER as well. For example:

export DISABLE_JAVADUMP=true 
export IBM_NOSIGHANDLER=true

For all other builds, do not enable the IBM_NOSIGHANDLER unless explicitly instructed by the Support teams, as it might result in inaccruate data being collected.

The application must be restarted for this change to become active. This will prevent Javacore and heapdump files from being created when running kill -3, or when any other signal is raised.


Collecting data

When a problem situation occurs, the objective is to collect as much information as possible that will either identify the cause of the issue, or provide a direction in understanding the cause. The issue could be the operating system (kernel/library), the JVM ( or JIT), other native code, or the Java application code. This section includes a collection of commands that will collect data from each area. Most of the commands listed must be executed as the root user.

The following commands are used to get a snapshot of basic operating system settings, and to collect trace information. Unlike the AIX core file, which provides a single snapshot, the traces provide historical information that might show a sequence of events while it occurs.

  1. When the problem occurs, the following commands need to be executed in the order listed with little delay between the execution of each command. If running these commands manually will result in delays between the execution of them, we suggested that you save the commands in a shell script, then run the shell script when the problem occurs.
    vmstat 1 20 > vmstat.out
    trace -a -n -l -L50000000 -T25000000 -o trace.out
    sleep 30
    trcstop
    ps -ef > ps.out
    ps avwwg >> ps.out
    pdump.sh java_pid (see below)
    pdump.sh java_pid (see below)
    pdump.sh java_pid (see below)
    kill -11 java_pid

    To collect data needed for analyzing the trace, run these after trace data is collected:

    trcnm > trcnm.out
    gennames > gennames.out 
    gensyms > gensyms.out 
       (AIX v5.2 only)
    cp /etc/trcfmt trcfmt.out
    trcrpt -r trace.out > trcrpt.out
    tprof -ske -n gennames.out -i trcrpt.out 
       (AIX v4.3.3 and AIX v5.1 only)
    tprof -ske -r trace.out 
       (AIX v5.2 only)

    To collect system information, you can run the AIXsnap utility instead. Run these after trace data is collected:

    errpt -a > errpt.out
    lslpp -lc > lslpp.out
    instfix -i > instfix.out
    bootinfo -K > bootinfo.out
    lsattr -El sys0 > lsattr.out
    lsps -s > lsps.out

    Download the libsGrabber.sh and pdump.sh utilities.

    The pdump.sh utility is a shell script that runs the AIX kernel debugger kdb.

  2. Package the output files by running the following commands:
    tar -cf - *.out pdump.* __* | compress -c > trace-data.tar.Z

    For any AIX core file created, run the following commands:

    mv core core.001
    If you used one of the methods above to direct the core to an alternate location, run the above command core file in the alternate directory, not on the symbolic link.
    export PATH=/java_home/jre/bin:${PATH}
    And replace java_home with the correct location of the SDK.
    which java java –fullversion
    If this does not report the correct version, correct the PATH variable in the previous step and rerun. The file listed should not be the Java shell /java_home/bin/java.

    In addition to the core file, the support teams will also need a copy of the Java or JNI executable and all library (.a or .so) files referenced by the process. There are two methods for collecting the additional information. The first uses commands provided with the operating system (assuming all of the above filesets have been installed), but will require much more additional disk space since this will create a copy of all files (including the core file, which can be large).

    There are two methods for collecting additional information for analyzing the core files; one using the AIX utility snapcore or a support provided script called libsGrabber.sh. There is a current limitation with the snapcore utility that prevents it from collecting all of the required information. This is issue is being investigation and a resolution should be provided in the near future. It is recommended that you use libsGrabber.sh. The steps for using these are:

    Download the libsGrabber.sh utility.

    libsGrabber.sh core.001
    Creates core-libs.tar.Z, which contains libraries and executable.
    compress core.001
    Creates core.001.Z. Make sure your Java executable file is inside core-libs.tar.Z.
    uncompress -c < core-libs.tar.Z | tar -tvf -

    or

    snapcore -d save_directory core.001 fullpath_executable
    For example, snapcore -d /tmp/savedir core.001 /usr/java131/jre/bin/java. This will create an archive (snapcore_pid.pax.Z) in the file /tmp/savedir. Should the problem occur with the 64bit JVM, you should run the snapcore utility and not the second method, which is to use a shell script provided by the Support teams. Using this method will require less space than the above method.
  3. There is additional data, if collected, will help us better understand the situation.
    1. When using AIX 5, run the truss utility when the issue occurs, soon after the process has started:
      truss -eafo truss.out -p java_pid

      or
      truss -eafo truss.out java options ...
    2. When using newer releases of Java 131 and 14, use the Java HPROF facility for collecting the time methods have used. Since this is a JVM feature, it provides a means to compare JVM across multiple platforms.
      java -Xrunhprof:cpu=samples,monitor=y,format=a,file=hprof_output.txt your_other_java_options

      This creates the file hprof_output.txt with statistical information.

      For more documentation on using hprof, you can reference the Sun documentation at http://java.sun.com/j2se/1.3/docs/guide/jvmpi/jvmpi.html#hprof


Packaging and sending data to IBM Support

To package and send your data to IBM Support:

  1. Tar(archive) the files using the filename xxxxx.byyy.czzz.#.tar
    tar -cf xxxxx.byyy.czzz.#.tar trace-data.tar.Z core.001.Z 
       core-libs.tar.Z optional-files

    or
    tar -cf xxxxx.byyy.czzz.#.tar trace-data.tar.Z 
       snapcore_pid.pax.Z optional-files

    where:
    • xxxxx is the PMR number
    • yyy is the branch code
    • zzz is the country code
    • # is a sequence number or date necessary to ensure that each file placed on the testcase server is unique
    Note the size of the file.

    Before sending the file, verify each archive file is valid and that the following is included:

    • trace-data.tar.Z
    • core.001.Z or core-libs.tar.Z
    • snapcore_pid.pax.Z
    • optional-files are standard output, error logs, Javacore files, and so on
  2. The files should be sent to the IBM testcase server using a unique filename for each file. To ensure timely response from the AIX Support teams, it is important to follow the instructions below. If you have trouble connecting or sending data to testcase servers, check the firewall and proxy settings within your network.
    ftp testcase.boulder.ibm.com
    login: anonymous
    password: user@host.com
    > cd /aix/toibm
    > bin
    > put xxxxx.byyy.czzz.#.tar
    > dir xxxxx.byyy.czzz.#.tar          
       (make sure the uploaded file size is equal to the original)
    > quit
  3. Only after the data has been transferred, contact IBM Support by sending e-mail or calling the IBM support hotline.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=87788
ArticleTitle=Troubleshooting Java on AIX: Data collection for debugging hangs, high CPU, and performance issues
publish-date=04292004