Java diagnostics, IBM style, Part 5: Optimizing your application with the Health Center

Quickly and easily fix performance problems, identify configuration issues, and monitor Java applications

IBM Monitoring and Diagnostic Tools for Java - Health Center is a tool for monitoring a running Java application. It reports on all aspects of system health via charts, graphs, and tables, and it makes recommendations for fixing problems. The Health Center includes an extremely low-overhead method profiler, a garbage-collection visualizer, a locking profiler to identify contention bottlenecks, and a configuration explorer. Find out how you can use this tool to diagnose and fix performance, configuration, and stability issues in your applications.

18 July 2013 - Added new resource item for the article, "Java Technology Community," to Resources).

Share:

Toby Corbin, Software Engineer, IBM  

Toby CorbinToby Corbin is a software engineer currently developing RAS tooling within the IBM Java Technology Centre. He joined IBM in 2001 and spent four years developing national language support and globalization of the Java Runtime Environment, followed by two years developing the Swing and AWT libraries.



Dr. Holly Cummins, Software Engineer, IBM

Dr. Holly CumminsHolly Cummins is a performance-tooling developer within the IBM Java Technology Centre. She leads development on IBM Monitoring and Diagnostic Tools for Java - Health Center and was the author of the GC and Memory Visualizer tool. Her tooling work builds on her experience working as a performance engineer within the garbage-collection development team.



18 July 2013 (First published 07 October 2009)

Also available in Chinese Russian Japanese

About this series

Java diagnostics, IBM style explores new tooling from IBM that can help resolve problems with Java applications and improve their performance. You can expect to come away from every article with new knowledge that you can put immediately to use.

Each of the authors contributing to the series is part of a team that creates tools to help you resolve problems with your Java applications. The authors have a variety of backgrounds and bring different skills and areas of specialization to the team.

Please contact the authors individually with comments or questions about their articles.

What's causing my application's performance problems? How can I fix them without needing to become a performance expert? Is my application stable? Is it configured sensibly? IBM Monitoring and Diagnostic Tools for Java - Health Center is new tooling from IBM designed to answer these questions and more. It checks garbage-collection activity, method execution, application synchronization, and configuration. As well as providing the information needed to diagnose problems, the expert system in the Health Center will solve problems for you: providing analyses, flagging areas for concern, giving recommendations, and suggesting command lines. The Health Center is so lightweight it can even be used in production. This article shows you how to download and install the Health Center and how to use it to troubleshoot your applications.

Getting started with the Health Center

The Health Center tool is provided in two parts: the client and the agent. The agent sends information from the monitored JVM to the client. The client connects to the agent and shows information in a GUI about the health of a running Java application.

JVM requirements

The Health Center is designed to run on IBM JVMs, Java 5 and above. It requires a Java level of at least Java 5 service refresh 8 or Java 6 service refresh 1. To be suitable for use in production, you need Java 5 service refresh 10 or Java 6 service refresh 5.

Installing the client

The Health Center client is a part of the IBM Support Assistant (ISA). To install the client:

  1. Download and install the ISA Workbench.
  2. Start the ISA Workbench, and from the menu bar, select Update > Find New... > Tools Add-ons.
  3. In the Find new tools add-ons wizard, type health in the search box, then expand the Twistie next to JVM-based Tools to show the Health Center entry, as shown in Figure 1:
    Figure 1. Installing the Health Center client into ISA
    The installation process into ISA
  4. Select the Health Center entry, click Next, and follow the prompts to complete the install and restart ISA.

Installing the agent

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Monitor and diagnose Java applications

Once you've installed the client, you must download the agent from within the client and install it:

  1. Click Analyze Problem on the ISA Welcome page.
  2. Select the Tools subtab. From the list of installed tools, select IBM Monitoring and Diagnostic Tools for Java - Health Center and then click Launch to open the Health Center connection wizard, shown in Figure 2:
    Figure 2. The connection wizard
    The Health Center connection wizard
  3. Click the Enabling the application for monitoring link.
  4. On the next page that displays, click Installing the Health Center agent into an IBM JVM.
  5. On the Installing the Health Center agent into an IBM JVM page, click the link corresponding to the JVM on your system to download the zipped archive of agent files.
  6. Unzip the archive into the root directory of the JVM to be monitored. From Java 6 service refresh 2 and Java 5 service refresh 8, the Health Center agent is included with the JVM. However, the included agent may not be the most current version, so it is still best to overlay the existing Health Center agent with a new one. Answer yes if prompted to overwrite files during the install.

Starting the application to be monitored

In order to monitor an application with the Health Center, you must launch your application using a command-line option that enables the Health Center agent. The option syntax depends on your JVM version:

  • For Java6 SR5 and above, use -Xhealthcenter.
  • For SR1 and above, use the -agentlib:healthcenter -Xtrace:output=perfmon.out options.
  • For Java 5, SR10 and above, use -Xhealthcenter. Otherwise, use -agentlib:healthcenter -Xtrace:output=perfmon.out.

If you don't know your Java version, you can print it to the console by running the java -version command, as shown in Figure 3:

Figure 3. Getting the Java version
Version listing from a Java 6 SR5 JVM

A simpler way to determine which command option to use for enabling the agent is to try the -Xhealthcenter option first. If it doesn't work, try -agentlib:healthcenter -Xtrace:output=perfmon.out.

If the Health Center agent starts successfully when you launch your application, a message prints to the console. Figure 4, for example, shows that the Health Center agent started on port 1972 when we ran java -version with the -Xhealthcenter option. The agent will usually listen on port 1972 but will autoincrement the port if 1972 is already in use (by another Health Center agent, for example).

Figure 4. Running java -version with the Health Center enabled
Running java -version with the Health Center enabled

Once the agent has started, you can use the Health Center client to monitor your application and solve a range of performance and configuration problems.


Example 1: Triaging and fixing a performance problem

Performance optimization must be guided by concrete evidence. It is easy and tempting to identify potential performance improvements by inspecting application code and then charging in with a fix. However, premature optimization can be counterproductive and dangerous. Optimized code is often less maintainable than slower but more natural equivalents. Making these optimizations is also time-consuming. The cost of optimization and the maintenance penalty of heavily optimized code is undoubtedly worth it when performance is improved; however, many optimizations yield almost no performance gain. Optimizations must be targeted in the areas where they will make a difference. The process of identifying where optimizations will make a difference is the process of finding bottlenecks.

Analyzing performance problems

Essentially, all performance problems are caused by a limited resource. Several basic kinds of computational resources can affect performance: the CPU, memory, I/O, and locks. A CPU-bound application can't get enough processor time to complete the work it's being asked to do. A memory-bound application needs more memory than is available. An I/O-bound application is trying to do I/O faster than the system can handle. A lock-bound application is one in which multiple threads are fighting over the same locks. Synchronization between the threads is causing contention on the locks. As systems become increasingly parallel, synchronization becomes a limiting factor in their scalability.

By identifying the bottleneck, you identify which resource is limited. The Health Center has a number of facilities for investigating and diagnosing poor application performance. Performance analysis is a process of finding the bottleneck, fixing it, identifying the next bottleneck, correcting it, and repeating the process until application performance is satisfactory. The Health Center automates many of the processes required to identify which resource is limited. It cannot currently analyze I/O usage, but it provides visualizations and recommendations for CPU usage, memory usage, and lock usage.

The Health Center's Status perspective shows a dashboard with a status indicator for each potentially limited resource. Red or orange statuses highlight areas where application performance can be improved. Figure 5 shows the opening Status perspective:

Figure 5. The Status perspective
The status perspective

Investigating garbage collection

Excessive garbage collection (GC) is a common cause of poor application performance. GC is the process the JVM uses to manage application memory automatically. Its benefits — in terms of code safety and simplicity, and often even in terms of performance — are enormous. However, the garbage collector does use processing resources, and pathological patterns of memory usage in the application or inappropriately configured heap sizes can cause the garbage collector to become a significant performance bottleneck.

The Health Center provides detailed visualizations of GC activity and recommendations about GC tuning. Figure 6 shows the Health Center's Garbage Collection perspective. It includes visualizations of the used heap (useful for assessing application's memory usage) and pause times (useful for assessing the performance impact of GC). There is also a summary table of GC statistics and a set of recommendations, including a suggested command line if needed.

Figure 6. The Garbage Collection perspective
The Garbage Collection perspective

See the full figure here.

When assessing GC's performance effects, the most important items to check are the pause times and the overhead: the proportion of time spent garbage collecting instead of doing application work. Figure 7 shows the Health Center's calculation of GC overhead. If the overhead is high, not enough processing time will be available for the application. If pause times are high, application responsiveness can suffer because application activity is halted during GC pauses.

Figure 7. Table showing GC overhead
Garbage Collection summary table with the overhead row circled

Strategies for improving GC performance

GC's performance impact can be reduced in several ways. The first is to generate less garbage. Is the application creating objects and then throwing them away again immediately? Are strings being repeatedly concatenated without string buffers? Could autoboxing be creating unwanted primitive wrappers under the covers? Is code creating needlessly large objects? Performance can also be improved by increasing the heap size, playing with the nursery size (for generational collectors), and trying a different GC policy. However, some care is required with all of these techniques.

For example, you must use caution when tuning code to improve GC. Some apparent optimizations can turn out to worsen performance. For example, it is tempting to avoid repeatedly creating objects by simply reusing existing object instances. However, managing the object pool safely can result in a synchronization overhead. If the pool size is not tuned correctly, many unused object instances can end up hanging around, using up memory without contributing value. Most seriously, though, with a generational garbage collector (such as that used with the gencon policy), short-lived objects are effectively "free" to collect while longer-lived objects can be very expensive. Repeatedly creating and throwing out objects carries very little performance overhead, but holding onto existing object instances for reuse creates significant work for the garbage collector.

Another GC tuning trap is focusing too much on reducing the headline GC overhead figure (see Figure 7) and forgetting that the overall goal is optimizing application performance. Although reducing GC overhead generally increases application performance, there are exceptions. For example, the time spent doing GC with the gencon policy is generally higher than the time spent collecting with the optthruput policy (see Resources). However, because of how it lays out objects in the heap, the gencon policy enables much faster object allocation and faster object access. Because applications usually spend lots of time allocating and accessing objects, the performance gains in this area (which aren't reflected in the GC overhead) outweigh the extra cost of the GC itself for many applications. Even more dramatically, the optavgpause policy does almost all the GC work concurrently, with only very brief stop-the-world pauses. This creates a very low GC overhead, usually well under 1 percent, and excellent application responsiveness. However, application throughput is not as good as with the optthruput or gencon policies.

Investigating lock contention

Lock contention occurs when a lock is currently in use and another thread tries to acquire it. High levels of contention can occur on locks that are obtained frequently, held for a long time, or both. A highly contended lock (one in which many threads try to gain access) can become a bottleneck in the system, because each running thread pauses its execution until the lock it requires becomes available, limiting application performance. Synchronization is especially likely to hinder application scalability; the higher the number of threads in the system, the greater the proportion of time spent waiting on locks instead of doing useful work. The Health Center visualizes lock activity and highlights locks that are blocking requests and potentially affecting performance.

Figure 8 shows the Health Center's Locking perspective:

Figure 8. The Locking perspective
The Locking perspective

See the full figure here.

To understand what the graph in Figure 8 shows, you need to understand the data that is being displayed. The bars in the graph are color-coded green through to red to indicate how contended they are. A heavily contended lock shows as bright red, and an uncontended lock shows as bright green. The height of each bar shows, relative to one another, which lock blocked the most. The color and the height together therefore show which locks are the most contended.

A lock can have a very high bar but still be a green color. This would mean that although the lock had a high number of blocked requests, overall it had a very high number of successful acquires so the percentage of overall blocked requests would be low. But, this would still be likely to affect performance. In the locking graph in Figure 9, you can see two tall orange-colored bars:

Figure 9. The locking graph
The Locking perspective graph

In Figure 9, the two locks are prime candidates for further investigation, because both bars indicate high levels of contention. They are the highest two bars, which tells us they have the highest number of blocked requests, and they are also moving toward red, which indicates that the overall percentage of blocked requests for those locks is high.

Strategies for improving locking performance

Any locks with high slow counts are likely to affect performance. How can the performance impact of these locks be reduced? The first step is to identify which code is using the problematic lock. The Health Center shows the class of the lock. In some cases, this is sufficient to identify the lock uniquely, but in others it is not. For example, if the lock object is of class Object, it may not be easy to search the code for references to it. Figure 10 shows the Health Center analysis of lock activity, with various locks of class DataStore and of class Object. If the Health Center highlights locking as a performance problem but it is not obvious from the Health Center analysis which lock is causing problems, you can refactor the code to use more distinctive class names in lock objects or take a series of javacores to capture the call stacks invoking the most popular locks.

Figure 10. The locking table
Close-up of the locking table

Once you have identified problematic locks, you should rewrite the application to reduce the contention on the lock. You can minimize contention in two ways: reduce the amount of time locks are held, which reduces the chances of a conflict, or increase the number of lock objects used, so that each lock object is used in fewer contexts (see Resources).

Investigating code execution

A method profile tells you what code the application is spending its time running. (It doesn't tell you when your application is waiting on a lock instead of running your code, and it doesn't tell you when the JVM is collecting garbage instead of running your code.) If one or two methods are consuming a disproportionate amount of CPU time, optimizing these methods can yield significant performance gains. Figure 11 shows the Health Center's Profiling perspective. It includes a table of the most active methods, sorted by activity level (hotness), and a set of recommendations.

Figure 11. The Profiling perspective
Profiling perspective

See the full figure here.

If no methods are colored orange or red, application execution is relatively evenly balanced between the methods. It is still worth optimizing the hottest methods because they may still yield only modest performance. For example, a method may be making excessive use of I/O. In this case, one method is clearly using more CPU than the rest, and so it's colored red. By reading the value in the left-hand Self column in Figure 11, you can see that 42.1 percent of the time the JVM checked what the application was doing, it was executing the DataStore.storeData(I) method. The Tree column on the right shows how much time the application spent in both the DataStore.storeData(I) method and its descendents. In a simple single-threaded application, 100 percent of the Tree time would be spent in the main() method.

Strategies for optimizing code

The aim in performance tuning is to make the application do less work. Because the most work is happening in methods at the top of the method profile, these are the ones that you should consider. You can safely ignore methods near the bottom of the method profile.

Sampling profilers versus tracing profilers

The Health Center generates its method profiles by sampling the call stack every 2 milliseconds. Some profilers, known as tracing profilers, instead track (trace) method entry and exits. This gives detailed information about all methods called and their duration, but it imposes a significant overhead on the profiled application. Sampling instead of tracing is one of the ways the Health Center is able to achieve such low overhead. However, because they don't know when methods are started and finished, sampling profilers cannot distinguish between a method that appears frequently on the call stack because it is called often and a method that appears frequently because it takes a long time to complete.

A method appears near the top of a method profile for one of two reasons. Either the method is being called too often, or it is doing too much work when it's called. Methods with loops, and particularly nested loops, can be time-consuming to execute. Methods that contain many lines of code are also likely to take more time to execute than very short methods. A method without much code and with no loops at the top of a profile is probably being called very frequently.

An application can be optimized by doing less work in time-consuming methods, and invoking frequently called methods less often. The most effective way to do less work in methods is move as much code as possible out of loops. The way to call a method less often is to find out what code is calling that method, and then not make those calls. Often, at least in the first passes of optimization, a surprising proportion of the calls to the top method are totally unnecessary and can be trivially eliminated. Selecting a method in the Health Center's method profile will show its invocation paths at the bottom of the screen. Figure 12 shows that 90.3 percent of the calls to the DataStore.storeData method were from the StoreData.run method:

Figure 12. An invocation-path tree
Invocation of a method

Focusing on different time intervals

Sometimes an application's behavior changes dramatically during a certain period of time; garbage activity becomes frantic, or a lock becomes highly contended, or a method rockets to the top of the method profile. The Health Center allows analyses to be done just for a selected period of time. Dragging a rectangle on a graph narrows down the period of time shown. This cropping affects all the perspectives. Recommendations are updated to be based only on the selected time period. The method-profiling table is also updated so that the sample counts and percentages are only for the selected time period. This allows you to see only the methods executing when GC went wrong, for example, or to exclude data analyzed during application startup.


Example 2: Fixing configuration issues

In modern environments, a single Java application will most likely have been created and deployed by several individuals, or possibly even by several large teams. This is especially true when the application is running within an application server or framework. Launch parameters can be set in several different places, and it can be quite difficult to work out which JVM is running which application with which configuration. While this may be perfectly acceptable under normal circumstances, if problems occur, it can be vital to have a better understanding of how the Java application is configured. Poor configuration can cause performance problems that can't always be identified using the techniques already described, cause unexpected application behavior, and jeopardize serviceability.

The Health Center includes a perspective called the Environment perspective, which exposes the monitored application's classpath, system properties, and environment variables. It also analyses the Java configuration and provides recommendations about potential problems. Figure 13 shows the Environment perspective:

Figure 13. The Environment perspective
The environment_perspective

See the full figure here.

Which JVM am I running?

One of the simplest pieces of information shown in the Environment perspective is the location of the JVM being monitored. It's surprising how often this information is sufficient to solve a problem. Are expected classes unavailable? Are the performance characteristics different from what was expected? Have changes to files in JAVA_HOME (such as logging or security configuration or library changes) not been picked up? In diagnosing any of these issues, the first thing to check is that the JVM being run is the one you expected. Most modern systems have many JVMs installed, and large applications can even include several different JVMs in different locations.

What launch parameters am I running with?

Java technology allows a dizzying range of command-line options. Some of these options are suitable for some environments but not for others. In a modern application deployment, options can be set in several nested launch scripts, and it's not always clear which options the application is actually being run with. The Health Center provides a view of the final Java command line, which allows you to identify unexpected application configurations. For example, the maximum heap size may have been capped very small. This may be acceptable for a small utility application, but it can lead to crashes for an application that manipulates large data sets in-memory. If the team responsible for the crashing application doesn't know that the maximum heap size has been set, it can be difficult to understand why there's never enough memory. Figure 14 shows the Health Center listing of a set of command-line options:

Figure 14. Listing of command-line options
Java parameters

See the full figure here.

The Health Center also analyzes the Java configuration and flags options that can have undesired consequences. For example, options that were added during the testing phase of the deployment may not have been removed before the move to production. In particular, a number of debug options (such as -Xdebug and the -Xcheck options) can be handy for finding problems during testing. However, these options carry a performance overhead and should be avoided when optimum performance is desired. Other options, such as -Xnoclassgc, can give a small performance benefit but at the risk of indefinitely expanding memory requirements.

Will I have enough information to diagnose crashes?

Not all configuration issues relate to the Java command line. Some are caused by the underlying system properties. The most significant of these relate to Linux® and AIX® systems' limitations on the resources used by a process. The CPU usage, amount of virtual memory consumed, file sizes, and core file sizes can be restricted with the ulimit command. Although this can be useful, it also has the potential to hamper problem diagnosis. If core files are truncated because the ulimit is set too low, it is nearly impossible to extract meaningful information from them. On some systems, the default ulimit settings almost always result in truncated core files. (A core file is an image of a process that the operating system creates automatically if the process terminates unexpectedly. Core files are vital in diagnosing many types of application problems, particularly crashes.)

Because core files are only produced when problems occur, problematic ulimit settings usually only become apparent when it is already too late; the JVM has died, taking all the information needed to diagnose the problem with it. Normally the only way to ensure that ulimits don't prevent complete core files from being produced is to guess that there could be a problem and manually check the ulimit settings. Unfortunately, this isn't ideal because it involves fairly obscure knowledge and lots of defensive system administration. The Health Center automates detection of this common serviceability problem. With the Health Center, there is no need to check ulimit settings explicitly — it issues a warning if the ulimits need adjustment.


Example 3: Assessing system stability

Although crashes are unusual in Java environments, Java applications can still terminate unexpectedly. Such crashes have a number of causes. A common cause is that the Java code has called out through the Java Native Interface (JNI) to native code, and an unsafe memory access in the native code has triggered a general protection fault (GPF). Another common cause is that the Java application has exhausted the heap and cannot continue executing. It is also possible for the Java application to run out of native memory, which will also cause a crash (see Resources). Crashes caused by memory exhaustion (Java heap or native memory) are characterized by an OutOfMemoryException.

Diagnosing a memory leak

Most crashes cannot be predicted, but some can. In particular, the Health Center tries to predict crashes caused by filling up the Java heap. The Health Center cannot anticipate an OutOfMemoryException caused by an attempt to instantiate an infeasibly large object. However, most OutOfMemoryExceptions are caused by memory leaks: references held to memory that is no longer required. This memory cannot be released by the garbage collector. If unneeded objects continue to accumulate in the heap, eventually there will be no room for needed objects, and an OutOfMemoryException will be thrown. Figure 15 shows a Health Center visualization of the heap used by a leaky application. The memory requirements of the application have been steadily increasing, and the Health Center provides a warning explaining that an eventual crash is likely.

Figure 15. Graph of suspected memory leak
A memory leak

See the full figure here.

Once the Heath Center has identified the leak, how can it be fixed? The key is to figure out which objects are using the leaking memory. This kind of analysis is best done from a heap or system dump. A dump creates a record of every object in the heap, how much memory it's using, and which objects are referencing it. Holding references to objects that are no longer required is the most common cause of memory leaks in Java applications.

Heap analysis is nearly impossible without tooling support, and a number of excellent tools are available. Although the Health Center doesn't analyze dumps, another member of the diagnostic tooling family does. Like the Health Center, IBM Monitoring and Diagnostic Tools for Java - Memory Analyzer is available as a free download within ISA (see Resources). It provides a high-level summary of the heap contents, including identification of leak suspects. Often this alone is sufficient to pinpoint the cause of the leak and fix it. Figure 16 shows the Memory Analyzer in action:

Figure 16. ISA's Memory Analyzer
Memory Analyzer in ISA

See the full figure here.

For more complicated cases, the Memory Analyzer also provides powerful facilities for drilling down into the heap contents, including an object query language.


Conclusion

The Health Center provides a wide range of useful information about application execution, stability, and performance. It makes recommendations on GC usage, identifies hot methods, highlights areas of lock contention, and provides information on the environment the application is running in.

The Health Center is a valuable development tool that helps you iron out performance issues, memory leaks, and inefficient code before problems surface in production. Moreover, because of its exceptionally low overhead, the Health Center can also be used to solve problems on production systems. With the analysis and recommendations the Health Center offers, you needn't be a Java administration or performance-tuning expert to identify and fix problems.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=433565
ArticleTitle=Java diagnostics, IBM style, Part 5: Optimizing your application with the Health Center
publish-date=07182013