Memory leak detection and analysis in WebSphere Application Server: Part 2: Tools and features for leak detection and analysis

Here is a primer on memory leaks in Java™ applications, with information on the motivation, scope, and usage for tools designed to address these issues within IBM® WebSphere® Application Server.

Indrajit Poddar (ipoddar@us.ibm.com), Advisory Software Engineer, IBM

Indrajit Poddar is a member of the Strategy, Technology, Architecture and Incubation team in the IBM Software Group Strategy division, where he is developing several integration proofs of concept for building composite business services. He has a master's degree in Computer Science and Engineering from Penn State University, and he has a bachelor's degree in Computer Science and Engineering from the Indian Institute of Technology, Kharagpur, India. He received an IBM Outstanding Technical Achievement Award in 2005 for his contributions to the Memory Dump Diagnostic for Java tool.



Robbie John Minshall (rjminsha@us.ibm.com), WebSphere Performance Development, IBM

Robbie John Minshall currently works on developing Standard Data Objects for IBM Software Groups Service Orientated Architecture. Originally from New Zealand, he graduated with honors from The Johns Hopkins University with a BS in Mathematics and a BS in Computer Science. Robbie’s expertise includes application benchmarking and scalability, autonomic agents for performance, health and system monitoring in J2EE environments, including memory leak detection.



02 August 2006

Also available in Chinese

Introduction

Memory leaks in enterprise applications cause a significant number of critical situations. The cost is a combination of time and money for analysis, expensive downtime in production environments, stress, and a loss of confidence in the application and frameworks.

Non-representative test environments, ineffective workload identification, and insufficient testing cycles cause memory leaks to go undetected through testing procedures. Often companies are unable or unwilling to invest the significant time and money necessary to overcome these obstacles. The problem is one of education, culture, and finances. This article will not attempt to address these issues, but instead will focus on technological solutions to help solve the results of them.

This article is a follow-up to our introductory article, Part 1: Overview of memory leaks. In Part 2, we describe the memory leak analysis and detection features in WebSphere Application Server V6.1 in greater detail, with some real life case studies. This article presents a memory leak detection feature, newly introduced in WebSphere Application ServerV6.1 and an offline memory leak analysis tool called the Memory Dump Diagnostic for Java (MDD4J). The combination of these two features can be used for determining root causes of memory leaks in Java and Java 2 Platform, Enterprise Edition (J2EE™) applications running in WebSphere Application Server.

This article is written for Java developers, application server administrators, and problem determination consultants working with applications deployed on the WebSphere Application Server.


What is a memory leak in Java?

Memory leaks occur in Java applications when objects hold references to objects are no longer needed.  This problem prevents the automatic Java garbage collection process from freeing memory, even though the Java virtual machine (JVM) has a built-in garbage collection mechanism (see Resources) which frees the programmer from any explicit object de-allocation responsibilities. These memory leak issues manifest as increasing Java heap usage over time with an eventual OutOfMemoryError when the heap is completely exhausted. This type of a memory leak is referred to as a Java heap memory leak.


Fragmentation and native memory leaks

Memory leaks can also occur in Java due to failure to clean up native system resources like file handles, database connection artifacts, and so on, after they are no longer used. This type of memory leak is referred to as a native memory leak. These types of memory leaks manifest as increasing process sizes over time without any increase in the Java heap usage.

Although both Java heap memory leaks and native memory leaks eventually manifest as OutOfMemoryErrors, not all OutOfMemoryErrors are caused by Java heap leaks or native memory leaks. OutOfMemoryErrors can also be caused by a fragmented Java heap, due to the inability of the Java garbage collection process to free up any contiguous chunk of free memory for new objects during compaction. In this case, OutOfMemoryErrors can occur in spite of there being significant free Java heap. Fragmentation issues can occur in IBM’s SDKVersion 1.4.2 or earlier, due to the existence of pinned or dosed objects in the Java heap. Pinned objects are those that cannot be moved during heap compaction because of JNI (Java Native Interface) access to these objects. Dosed objects are those that cannot be moved during heap compaction due to references from the thread stacks. Fragmentation issues are often also exacerbated due to frequent object allocations of large sizes (exceeding 1 MB).

OutOfMemoryErrors due to fragmentation issues or due to native memory leaks are beyond the scope of this article. It is possible to distinguish native memory leaks and fragmentation issues from Java heap memory leaks by observing the Java heap usage over time. IBM Tivoli® Performance Viewer and verbose GC output (see Resources) can be used to make this distinction. Increasing usage of the Java heap leading to complete exhaustion indicates the existence of a Java heap memory leak, whereas for native memory leaks and fragmentation issues, the heap usage will not show significant increase over time. For native memory leaks, the process size will increase, and for fragmentation issues, there will be a significant amount of free heap at the time of the occurrence of the OutOfMemoryError.


Common causes for memory leaks in Java applications

As mentioned above, the common underlying cause of memory leaks in Java (Java heap memory leaks) are unintentional (due to program logic error) object references holding up unused objects in the Java heap. In this section, a number of common types of program logic error that lead to Java heap memory leaks are described.

Unbounded caches

A very simple example of a memory leak would be a java.util.Collection object (for example, a HashMap) that is acting as a cache but which is growing without any bounds. Listing 1 shows a simple Java program demonstrating how a basic memory leaking data structure.

Listing 1. Example Java program leaking String objects into a static HashSet container object
public class MyClass {
  staticHashSet myContainer = new HashSet();
  HashSet myContainer = new HashSet();
  public void leak(int numObjects) {
    for (int i = 0; i < numObjects; ++i) {
      String leakingUnit = new String("this is leaking object: " + i);
      myContainer.add(leakingUnit);
    }
  }
  public static void main(String[] args) throws Exception {
    {
      MyClass myObj = new MyClass();
      myObj.leak(100000); // One hundred thousand
    } 
    System.gc();
  }

In the Java program shown in Listing 1, there is a class with the name MyClass which has a static reference to HashSet by the name of myContainer. In the main method of the class: MyClass, there is a subscope (in bold text) within which an instance of the class: MyClass is instantiated and its member operation: leak is invoked. This results in the addition of a hundred thousand String objects into the container: myContainer. After the program control exits the subscope, the instance of the MyClass object is garbage collected, because there are no references to that instance of the MyClass object outside that subscope. However, the MyClass class object has a static reference to the member variable called myContainer. Due to this static reference, the myContainer HashSet continues to persist in the Java heap even after the sole instance of the MyClass object has been garbage collected and, along with the HashSet, all the String objects inside the HashSet continue to persist, holding up a significant portion of the Java heap until the program exits the main method. This program demonstrates a basic memory leaking operation involving an unbounded growth in a cache object. Most caches are implemented using the Singleton pattern involving a static reference to a top level Cache class as shown in this example.

Un-invoked listener methods

Many memory leaks result due to program errors that cause clean up methods from not getting invoked. The Listener pattern is a commonly used pattern in Java programs which is used to implement methods for cleaning up shared resources when they are no longer required. For instance, J2EE programs often rely on the HttpSessionListener interface and its sessionDestroyed callback method to clean up any state stored in a user session when the user session expired. Sometimes, due to program logic error, the program responsible for invoking the listener may fail to invoke it, or the listener method may fail to complete due to an exception and this might lead to unused program state lying around in the Java heap even after it is no longer required.

Infinite loops

Some memory leaks occur due to program errors in which infinite loop in the application code allocates new objects and adds them to a data structure accessible from outside the program loop scope. This type of infinite loops can sometimes occur due to multithreaded access into a shared unsynchronized data structure. These types of memory leaks manifest as fast growing memory leaks, where if the verbose GC data reports a sharp drop in free heap space in a very short time leads to an OutOfMemoryError. For this type of memory leak case, it is important to analyze a heap dump taken within the short span of time the free memory is observed to be dropping quickly. Analysis results are shown from two different memory leak cases observed in IBM Support involving infinite loops are discussed in sections titled Case study 3 and Case study 4.

While it might be possible to identify the memory leaking data structure by analyzing the heap dumps, identifying the memory leaking code which is in a infinite loop is not straightforward. The method which is stuck in an infinite loop can be identified by looking at the thread stacks of all the threads in a thread dump taken during the time the free memory is observed to be dropping quickly. IBM SDK implementations generate a Java core file along with the heap dump. This file contains thread stacks of all active threads and can be used for identifying methods and threads which might be stuck in an infinite loop.

Too many session objects

Many OutOfMemoryErrors occur due to inappropriate configuration for the maximum heap size necessary to support maximum user loads. A simple example would be a J2EE application that uses in-memory HttpSession objects to store user session information. If no maximum limit is set on the maximum number of session objects that can be held in memory, then it is possible to have many session objects during peak user load time. This can lead to OutOfMemoryErrors that are not really memory leaks, but improper configuration.


A WebSphere solution

Traditional memory leak technologies are based upon the idea that you know that you have a memory leak and would like to identify the root cause. Techniques vary but invariably involve heap dump analysis, attaching Java Virtual Machine Profiler Interface (JVMPI) or Java Virtual Machine Tools Interface (JVMTI) agents, or the use of byte code insertion to track insertions and deletions into collections. These analysis mechanisms are quite sophisticated, though they have significant performance burdens and are not suited for consistent use in production environments.

The problem

Memory leaks in enterprise applications cause a significant number of critical situations. The cost of such problems is comprised of a combination of time and money for analysis, expensive downtime in production environments, stress, and a loss of confidence in the application and frameworks.

The typical analysis solution is to attempt to move the application onto an isolated test environment where the problem can be recreated and analysis may be performed. The cost of associated memory leaks is magnified by the difficulty of reproduction in these test environments.

Traditional redundancy methods, such as clustering, can only help to a certain extent. Memory leaks will propagate throughout cluster members. As an effected application server’s response rate slows, workload management techniques cause requests to be routed to healthier servers, and can result in coordinated application server crashes.

Components of cost

An analysis typical of a memory leak scenario reveals that a major contributor to the analysis cost of memory leaks in enterprise applications is that they are not identified until the problem is critical. Unless the administrator has the skill set, time, and resources to monitor memory trends, users are often unaware that they have a problem until their application performance is crippled and their application server is unresponsive to administrative requests. Generally, the source of the cost associated with memory leaks has three primary contributors: test, detection, and analysis.

WebSphere Application Server has identified detection and analysis of memory leaks as two different problems with related but independent solutions. Unfortunately there is no simple technological solution to address the costs associated with sufficient testing, and that topic will not be addressed in this article.

Separating detection from analysis

The problem with traditional technologies is that they attempt to perform both detection and analysis. This results in solutions that are poorly performing, or involve technologies or techniques that are unsuitable for many production environments such as JVMPI agents or byte code insertion.

By isolating the problem of detection from analysis, we were able to deliver a lightweight production-ready memory leak detection mechanism within WebSphere Application Server V6.0.2. This solution uses inexpensive, universally available statistics to monitor memory usage trends and provide early notification of memory leaks. This gives the administrator time to prepare appropriate backup solutions and to analyze the cause of the problem without the expensive and frustrating problems associated with reproduction in test environments.

While the generation of HeapDumps is expensive and not recommended while an application server is under heavy production load, HeapDumps do not need to be generated while load is active. Administrators can setup load balancing, or generate HeapDumps at low usage times,to avoid short periods of poor performance.

To facilitate this analysis, WebSphere Support delivered the Memory Dump Diagnosis Tool for Java (MDD4J), a heavyweight offline memory leak analysis tool that incorporates multiple mature technologies into a single user interface.

To bridge the gap between detection and analysis, we have provided an automated facility to generate HeapDumps on IBM JDKs. This mechanism will generate multiple heap dumps that have been coordinated with sufficient memory leakage to facilitate comparative analysis using MDD4J. The generation of HeapDumps is expensive; this capability is disabled by default and an MBean operation is provided to enable it at an appropriate time.


Memory leak detection

Lightweight memory leak detection in WebSphere Application Server is designed to provide early detection of memory problems in test and production environments. It has minimal performance impact and does not require the attachment of additional agents or use byte code insertion. It is not designed to provide analysis of the source of the problem, though it is designed to work in conjunction with offline analysis tools, including MDD4J.

Algorithm

If the state of an application server and the workload is stable, then memory usage patterns should be relatively stable.

Figure 1. Verbose GC plot showing free memory (green) and used memory (red) for a memory leaking application
Verbose GC plot showing free memory (green) and used memory (red) for a memory leaking application

Enabling verbose GC is the first step in a problem determination procedure for debugging memory leaks. For instructions on enabling verbose GC on the IBM Developer Kit, refer to the Diagnostic Guide for the IBM Developer Kit (see Resources). Support and customers alike process and chart verbose GC statistics to determine if a memory leak was the cause of a failure (see Resources). If the free memory after a GC cycle is consistently decreasing, then there is a high chance of a memory leak. The chart in Figure 1 is an example of the charted free memory after a GC cycle in an application with a memory leak (the chart was using an internal IBM tool). The leak is very obvious to the eye, but unless you were actively monitoring the data,you would not have been aware of it until the server crashed.

Our memory leak detection mechanism will generally automate this process of looking for consistently downward trends in free memory after a GC cycle. We are not able to assume verboseGC information is available, and JVMPI is far too expensive for production (and requires an attached agent). We are therefore limited to PMI data which makes Java API calls to get free memory and total memory statistics. Verbose GC gives free memory statistics directly after a GC cycle, PMI data does not. We approximate the free memory after a garbage collection cycle by using algorithms that analyze the variance of free memory statistics.

Leaks can be very fast or incredibly slow, so we analyze memory trends of both short and long intervals. The period of time for the shortest intervals is not set but is derived by the variance of the free memory statistics.

Since we are running within a production server, and are trying to detect memory leaks (not create them), we have to store a very limited amount of data. Raw and summarized datapoints that are no longer needed are discarded in order to keep our memory footprint to a minimum.

Periods are analyzed by looking for downward trends in the approximated/summarized free memory statistics. The configuration of the rule dictates how strict conditions are applied, though it is configured with a good set of defaults that should be universally applicable.

In addition to downward trends in approximated memory after a garbage collection cycle, we look at situations where the average free memory after garbage collection is below certain thresholds. Such a situation is either a sign of a memory leak or of running an application on an application server with too few resources.

iSeries

OS/400®, or iSeries, introduces some unique scenarios. iSeries machines are often configured with an effective free memory pool size. This pool size dictates the amount of memory that is available for the JVM. When the Java heap exceeds this value, the DASD (disk) is used to accommodate the heap. Such a solution invariably results in terrible performance, though the administrator may be unaware of the problem since the application server remains responsive even though it is crippled.

If the Java heap size is going to be expanded onto DASD, we issue an alert notifying the administrator that this is going to happen and that they either have a too small effective memory pool size, too few resources, or a memory leak.

Expanding heap

Java heaps are commonly configured with a minimum and a maximum heap size. While the heap is expanding, analyzing free memory trends is very problematic. We avoid any downward trend analysis while the heap is expanding and, rather, monitor and identify if the heap will soon run out of resources. This is accomplished by monitoring if the heap size is consistently increasing, that free memory after GC cycles is within a certain threshold of the heap and is therefore pushing the heap to expand, and by projecting whether, if current trends continue, the JVM will soon run out of resources. If such a scenario is identified, we notify the user of the potential problem so that they may monitor the situation or set up contingency plans.

HeapDump generation

Many analysis tools, including MDD4J, analyze heap dumps to find the root cause of a memory leak. On IBM JDK, HeapDumps generated as a result of OutOfMemoryExceptions are often used for such analysis. If you wish to be more proactive, you need to generate HeapDumps at appropriate times for analysis. This is very important, because if the HeapDumps are generated at inappropriate times, a false analysis will result. For example, if HeapDumps are generated at the start of the workload, caches that are filling up will often be identified as a memory leak.

In conjunction with our memory leak detection mechanism, WebSphere Application Server provides a facility to generate multiple heap dumps in conjunction with memory leak trends. This ensures that Heap dumps are taken after evidence of the memory leak is apparent, and with enough memory leakage to ensure the best chance of a valid analysis result.

Automated heap dump generation can be enabled by default, or can be initiated at the appropriate time using an MBean operation.

In addition to the automated heap dump generation utility, heap dumps may also be generated manually by using wsadmin (see the WebSphere Application Server Information Center) or by sending a kill -3 signal (on Unix® platforms) after setting some environment variables (see the Diagnostic Guide in Resources) .

Benefits

Administrators are able to run lightweight memory leak detection in test and production environments and receive early notification of memory leaks. This empowers administrators with the ability to set up contingency plans, analyze the problem while it is recreateable, and to diagnose results prior to losing application responsiveness or an application server crash.

The result is a significant reduction in the cost associated with memory leaks in enterprise applications.

Inherit limitations

The memory leak detection rule was designed in accordance with a simple philosophy. It uses freely available data and makes the necessary approximations to provide reliable notification of the presence of memory leaks. Due to the inherit limitations of the data analyzed and the approximations required, there are existing solutions that use better data, and more sophisticated algorithms and should achieve more accurate results (though not without a heavy performance cost). However, we can say that, though simple, our implementation is cheap, uses universally available statistics, and detects memory leaks.

Autonomic manager integration

Lightweight memory leak detection is completely configurable and is designed to interact with advanced autonomic managers or custom JMX clients. IBM WebSphere Extended Deployment is an example of such a relationship.

WebSphere Extended Deployment abstracts WebSphere Application Server topology and deploys applications appropriately to react to changing workloads while maintaining application performance criteria. It also incorporates health management policies. The WebSphere Extended Deployment memory health policy uses WebSphere Application Server memory leak detection capabilities to identify when an application server has a memory leak.

WebSphere Extended Deployment provides a number of policies that configure memory leak detection. On example policy would react to a memory leak notification by taking multiple heap dumps (using workload management to maintain the performance of the application) for analysis. Another policy might simply monitor when the application server’s memory levels are critical to restart the application server prior to it being crippled.


Memory leak analysis for production systems

Identifying the root cause of a Java memory leak requires two steps:

  1. Identify where the memory leak is. Identify the objects, the unintentional references, the classes and objects holding those unintentional references, and the objects to which the unintentional references are pointing.

  2. Identify why the leak is happening. Identify the source code method (program logic) that is responsible for not releasing those unintentional references at the appropriate point in the program.

The Memory Dump Diagnostic for Java tool aids in the process of determining where the memory leak is occurring in the application. However, the tool does not aid in identifying the faulty source code responsible for causing the memory leak. After the leaking data structure classes and packages are identified with the aid of this tool, you can use any debugger or use specific trace statements in logging to identify the faulty source code method and make the necessary changes to the application code to resolve the memory leak. 

The analysis techniques are CPU, memory, and disk space intensive. Hence, the analysis mechanism is implemented as an offline tool. This mechanism is particularly suitable for large applications running in production or in stress test environments. This tool can be used to analyze (offline) these dumps obtained manually or produced in conjunction with lightweight memory leak detection.

The Memory Dump Diagnostic for Java tool is targeted at the following roles, with the objective of meeting these associated goals:

  • System administrators

    Which component is leaking (data structures in the customer's applications or inside WebSphere Application Server)?

    Upon analysis, the package names and class names of the objects identified in the leak candidate list can identify the component responsible for the memory leak without requiring deep knowledge of the application code.

  • Application developers

    What data structures are actually leaking and are not valid caches?

    Memory leaking data structures differ from non-leaking data only in the fact that leaking data structures grow in size without any bounds, whereas non-leaking data structures grow only within specific bounds. MMD4J provides tools to help developers confirm whether a suspect data structure is actually a memory leak or an appropriately growing data structure.

    What is causing the data structure to leak?

    After confirming a memory leaking data structure, the next question that arises is which classes, objects, and object references are causing the memory leaking objects to stay in memory beyond their expected lifecycle? The Memory Dump Diagnostic for Java tool helps to browse and navigate all the object references in the heap in a tree view while showing all the parent objects of any selected object. This helps identify potential unintentional object references causing memory leaks.

    What data types and data structures are causing a large footprint?

    There are many cases in which OutOfMemoryErrors occur not due to a memory leak, but due to configuration issues leading to excessive consumption of the Java heap. In these cases, it is often desired to detect the chief contributors to the footprint of the Java application to apportion blame to different components. The Memory Dump Diagnostic for Java tool helps identify the data structures that contribute significantly to the Java heap and the ownership relationship between these contributors. This helps the application developer understand the contribution to the Java heap by different application components.

This tool supports IBM Portable Heap Dump (.phd), IBM Text, HPROF Text, and SVC Heap Dump formats. For a more detailed listing of formats and supported JDK versions, see the Appendix.

Overview of technique

The Memory Dump Diagnostic for Java tool provides analysis functions for common formats of memory dumps from the Java virtual machine (JVM) that runs WebSphere Application Server on various IBM and non-IBM platforms. The analysis of memory dumps is targeted toward identifying regions or a collection of data structures within the Java heap that might be root causes of memory leaks.  The tool displays the contents of the memory dump in a graphical format, while highlighting regions that are identified as memory leak suspects. The graphical user interface provides browsing capabilities for verifiing suspected memory leak regions, and for understanding the data structures that comprise these leaking regions.

Two main types of analysis function are provided by this tool: single memory dump analysis and comparative analysis.

  • Single dump analysis is most commonly used with memory dumps that are triggered automatically by the IBM Developer Kit, Java Technology Edition, with OutOfMemoryExceptions exceptions.  This type of analysis uses a heuristic to identify suspected data structures having a container object (for example, a HashMap object with a single array of HashMap$Entry objects) with a large number of children objects. This heuristic is quite effective in detecting leaking Java collection objects that use an array internally, to store the contained objects. This heuristic has been found to be effective in a large number of memory leak cases handled in IBM Support.

    In addition to finding drop suspects, the single dump analysis also identifies aggregated data structures in the object reference graph (defined later) that are the largest contributors to the footprint of the Java heap.

  • Comparative analysis is used between two memory dumps taken during the run of a memory leaking application (that is, while the free Java heap memory is dropping). For analysis purposes, the memory dump that is triggered early in the run of the leaking application is referred to as the baseline memory dump.  The memory dump triggered after the leaking application has run for some time to allow for the growth of the leak is referred to as the primary memory dump.  In memory leak situations, it is expected that the primary memory dump contains a much larger number of objects occupying a much larger Java heap than the baseline memory dump. For better analysis results, it is recommended that the trigger point of the primary memory dump be separated from the trigger point of the baseline memory dump by a large amount of growth in total consumed heap size.

    The comparative analysis identifies a set of data structures experiencing significant growth in the number of instances of constituent data types. These suspect data structures are identified by categorizing all objects within each heap dump into different regions (or equivalence classes) based on similarity of ownership (that is, the chain of references leading to the objects in the object reference graph). The categorization is achieved by employing pattern matching techniques on the data types of objects in the ownership context of each object in the dumps. The regions found in each dump are matched and compared across the primary and baseline dump. Regions identified in the comparative analysis are characterized by the:

    • Leak container: The object which is holding all the objects with a large number of instances in memory; for example, the HashSet object in the memory leak example in Listing 1.

    • Leaking unit: The object type of a representative object which is growing or present in large numbers within a region; for example, the HashMap$Entry objects which are holding the leaking String objects in the memory leak example in Listing 1.

    • Leak root: This is a representative object in the chain of object references holding the leak container in the heap. This is often a class object which holds a static reference; for example, the MyClass object in the memory leak example in Listing 1. Alternatively, this can also be an object that is rooted in the Java stack or has a native reference holding this object in memory.

    • Owner chain: This is the set of objects in the chain of object references from the leak root to the leaking unit. The data types and package names of objects in the owner chain help to identify the application component responsible for the memory leak.

The analysis results are shown in an interactive Web-based user interface with the following features:

  • List a summary of analysis results, heap contents, size, and growth.
  • List suspected data structures, data types, and packages contributing to the growth in heap usage for comparative analysis, and to the large size of the heap for single dump analysis.
  • Ownership context view showing ownership relationship between major contributors to the footprint and significant constituent data types of the summarized set of major footprint contributors.
  • Browsing capabilities in an interactive tree view of the contents of the heap dump showing all incoming (only one reference shown in the tree, remaining ones shown separately) and outgoing references for any object in the heap and child objects sorted according to reach size.
  • Navigation capabilities from suspect lists to the ownership context and contents view to the browse view.
  • Tabulated views of all the objects and datatypes in the memory dump with filters and sorted columns.

Case study 1: Memory leak analysis for the MyClass memory leak example

A comparative analysis of a pair of heap dumps from MyClass (example code shown in Listing 1) shows the following leak suspects in Figure 2.

Figure 2. Suspects for MyClass memory leak example
Suspects for MyClass memory leak example

The Suspects tab of the analysis results lists memory leak suspects in four tables. The Data Structures That Contribute Most to Growth table lists data structures which are identified by the comparative analysis techniques described above. Each row of the table identifies a single memory leak suspect data structure:

  • Leaking Unit Object Type - lists the data type of the leak container object.
  • Growth - lists the growth observed in the size of this data structure/region in the heap in between the primary and baseline heap dump.
  • Size - lists the size of this data structure/region in the primary heap dump.
  • Contribution to Heap - lists the size of the data structure/region as a percentage of the total heap size of the primary dump.

Often there are multiple data structure suspects and the likelihood of a suspect being a memory leak can estimated from these columns. In this case, there is only one suspect and the fact that this data structure can account for 84 percent of the total heap size of the primary dump identifies this as a very likely suspect.

The second table, Data Structures with Large Drops in Reach Size, lists data structures identified by the single dump analysis on the primary heap dump. Each row in the table identifies a data structure with a potential container object that has a large number of child objects. Both the first and second table can point to the same suspect. If suspects identified in the first and second table are related, then the corresponding rows in both the tables are highlighted.

The third table, Object Types that contribute most to Growth in Heap Size, lists different data types that have experienced a large growth in the number of instances between the primary and baseline dump. These data types are not categorized into different data structures or regions based on their ownership context; rather, these are the top-most growing data types for the whole heap. Again, if a particular data type has a large number of instances in a selected data structure or region, then that data type row is highlighted.

The fourth table, Packages that contribute most to Growth in Heap Size, lists different Java package names for data types that have experienced a large growth in the number of instances between the primary and baseline dump. The application component which is responsible for the memory leak is often identified by the package name and class name of the data types which are part of growing regions. This table identifies suspect package names with the largest growth, which can help identify the responsible application component for the memory leak.

A single heap dump (often the heap dump generated automatically with the OutOfMemory error) can be also analyzed with MDD4J. Figure 3 shows the results of the Suspects tab when the primary heap dump from the example in Listing 1 is analyzed just by itself.

(There are only three tables in this case in the Suspects tab. There are no data structure suspects because comparative analysis is not performed. The Object Types that contribute most to Heap Size and Packages that contribute most to Heap Size tables do not show any growth statistics but only the total number of instances in the primary dump.)

Figure 3. Single dump analysis result for the MyClass memory leak example from Listing 1
Single dump analysis result for the MyClass memory leak example from Listing 1

After selecting a data structure in the Suspects tab, you can visit the Browse tab to see the chain of object references holding the leak container in the heap, as shown in Figure 4.

Figure 4. Browse suspects for MyClass memory leak example
Browse suspects for MyClass memory leak example

In this example, it can be seen that there is a chain of references starting from the class object with the name MyClass to a HashSet to a HashMap to an array of HashMap$Entry objects with a very large number of HashMap$Entry child objects. Each HashMap$Entry object holds a String object. This describes the data structure created in the memory leak example shown in Listing 1.

The tree view in this tab shows all the object references in the heap dump, except in the cases where an object has more than one parent object. The parent table in the left panel shows all the parent objects of any selected object in the tree. Any row in the parent table can be selected to expand the tree to the location of the selected parent object. The left panel also shows other details for any selected object in the tree; for example, the size of the object, the number of children, the total reach size, and so on.

Figure 5. Ownership context for MyClass memory leak example

The Ownership Context and Contents tab helps answer the question of what are the major contributors to the footprint of the heap in the primary dump (Figure 5). It also helps to show the ownership relationship between the identified major contributors and the constituent data types for each of the major contributors. In this example, the MyClass node has been identified as a major contributor in the OwnershipContext graph shown in the left panel. On the right panel, the data types that contribute significantly to this node are listed. The HashMap$Entry object, of which there is one instance for each element in the HashSet, is shown in this set.

The analysis results are also stored in a text file with the name AnalysisResults.txt. The text analysis results can be viewed from a link in the Summary tab, as well as accessed from the corresponding analysis results directory in the file system. Listing 2 shows a snippet from the AnalysisResults.txt file, which shows the results of the analysis for the MyClass memory leak example.

Listing 2. Textual analysis results for MyClass memory leak example
Suspected memory leaking regions:
Region Key:0,Leak Analysis Type:SINGLE_DUMP_ANALYSIS,Rank:1.0
Region Size:13MB, !RegionSize.DropSize!13MB
Owner chain - Dominator tree:
MyClass, class0x300c0110, reaches:13MB), LEAK_ROOT
|java/util/HashSet, object0x3036d840, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|-java/util/HashMap, object0x3036d8a0, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|--java/util/HashMap$Entry, array0x30cf0870, reaches:13MB), LEAK_CONTAINER
|---Leaking unit:
|----java/util/HashMap$Entry, object0x3028ad18, reaches:480 bytes)
|----java/util/HashMap$Entry, object have grown by 72203 instances
Region Key:2,Leak Analysis Type:COMPARATIVE_ANALYSIS,Rank:1.0
Region Size:12MB, Growth:12MB, 300001 instances
Owner chain - Dominator tree:
MyClass, class0x300c0110, reaches:13MB), LEAK_ROOT
|java/util/HashSet, object0x3036d840, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|-java/util/HashMap, object0x3036d8a0, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|--java/util/HashMap$Entry, array0x30cf0870, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|---java/util/HashMap$Entry, object0x30e88898, reaches:256 bytes), LEAK_CONTAINER
|----Leaking unit:
|-----java/util/HashMap$Entry, object have grown by 1 instances

Case study 2: Analysis results for memory leak due to un-invoked listener call back method

Figure 6 shows the Explore Context and Contents tab for a memory leak case involving a defect found during system testing in IBM WebSphere Portal. This defect occurred because there was some session state that was not being cleaned up when the session object was invalidated. The Ownership Context graph in this example shows that the MemorySessionContext node is the largest contributor to the footprint of the heap, which is expected because the MemorySessionContext is a WebSphere object which stores all the in memory session data.

Figure 6. Ownership context for a memory leak involving un-invoked listener methods

To find the root cause of the memory more specifically, it is necessary to see the Browse tab in Figure 7, where you can see that there are a very large number of LayoutModelContainer objects that are WebSphere Portal Server objects stored in the user session. After looking closely at the data structure and the number of LayoutModelContainer objects, it is possible to infer that the LayoutModelContainer objects were not getting removed when they were no longer required. Hence, it was possible to infer that the session invalidation listener code was not getting invoked properly. It was later discovered that the root cause was a WebSphere Application Server bug related to session invalidation when multiple clones are present, which was at the root cause of this problem. This issue was rectified soon afterwards.

Figure 7. Browse view for leaking WebSphere Portal LayoutModelContainer objects

Case Study 3: Analysis results from a memory leak due to an infinite loop

Figure 8 shows the Suspects tab from the analysis result of two heap dumps taken from a memory leak case involving an infinite loop. The symptoms presented in the verbose GC logs show a very fast drop in the available free heap space in a very short time. Analysis of a heap dump taken during the time when the free heap was dropping was critical to understanding the root cause of the problem. The OutOfMemoryError heap dump did not have the memory leaking data structure in it, because the memory leaking data structure was rooted in the Java stack, which got unrolled prior to generating the heap dump.

As can be seen from the Suspects tab, there are a very large number of instances of objects from the package: org.apache.poi.usermodel and an unusually large number of instances of the class org.apache.poi.usermodel.HSSFRow.

Figure 8. Suspects for a memory leak in Apache Jakarta POI application
Suspects for a memory leak in Apache Jakarta POI application

Figure 9 shows the Browse tab in this analysis result. It can be observed that there is a chain of reference starting from an object of the type org.apache.poi.hssf.usermodel.HSSFWorkbook, which has an ArrayList containing a very large number (20,431) of HSSFSheet objects.

Figure 9. Browse leaking Apache Jakarta POI HSSFSheet objects

Further analysis showed that the HSSFSheet objects were getting created in a method which was stuck in an infinite loop and were getting added to an HSSFWorkbook object which was referenced from the Java stack. The thread dumps taken at the same time as the primary heap dump showed two Java thread stacks which were in the same method which was creating the HSSFSheet objects. Inspection of the Java source code (from the open source Apache project Jakarta POI) revealed some unsynchronized multi-threaded code access patterns which were fixed in a subsequent release. From this analysis, it was possible to narrow down the root cause of the memory leak to the Jakarta POI application component.

Case study 4: Example of a memory leak due to an infinite loop

Figure 10 shows another example of a memory leak due to an open source application component: com.jspsmart.upload.SmartUpload. From the left panel, you can see that there are an unusually large number of com.jspsmart.upload.File objects that are pointing to the SmartUpload object.

Figure 10. Browse memory leaking jspsmart SmartUpload File objects

Case study 5: Analysis results from a memory leak involving a large number of JMSConnection objects

Figure 11 shows a memory leak case involving a large number of JMSTopicConnection objects.

Figure 11. Suspects showing leaking JMS connection artifacts
Figure 12. Ownership context for leaking JMS connection objects

From the Ownership Context Graph in Figure 12, you can see that these JMSTopicConnnectionHandle objects are owned by another significant contributor node to the Java heap with PartnerLogData class in it. In addition, the PartnerLogData class has an unusually large number of XARecoveryWrapper objects. Further investigation revealed the existence of a WebSphere Application Server bug that was causing unused XARecoveryWrapper objects to remain in memory. These XARecoverWrapper objects were in turn holding up a large number of JMSTopicConnection objects in the Java heap. These JMSTopicConnection objects were also holding up significant amount of native heap resources. Thus, this problem was manifesting as a native memory leak with the root cause in the Java heap.

Case study 6: Analysis results showing WebSphere objects which are not really leaking

Figure 13. In-memory HTTP session artifacts in WebSphere

Figure 13 shows a memory leak analysis suspect pointing at the WebSphere MemorySessionContext object. The MemorySessionContext object has a reference to com.ibm.ws.webcontainer.httpsession.SessionSimpleHashMap leading to instances of .ibm.ws.webcontainer.httpsession.MemorySessionData objects. These objects are WebSphere Application Server implementations of in-memory HTTP session objects. These objects can be present in large numbers in a J2EE application heap, which uses HTTP sessions to store user session in memory. Such objects do not always signify a memory leak. Presence of a large number of objects of these types can imply that there too many sessions currently active, due to a heavy user load and an OutOfMemory error which, under these circumstances, can be easily circumvented by either increasing the maximum heap size, or by setting a limit on the maximum number of live sessions kept in memory at any time. Presence of a large number of objects of these types could also signify a deeper memory leak where application objects held by these session objects are actually leaking. So you can assume that the memory leak is in WebSphere Application Server when there are WebSphere-specific classes showing up in the analysis results.


Comparing available analysis tools

There are two kinds of Java memory leak detection tools currently available on the market. The first type of tool is an online tool that attaches to a running application and derives Java heap information from the application, either by instrumenting the Java application or by instrumenting the JDK implementation itself. Examples of this nature are Wily LeakHunter, Borland Optimizeit, JProbe Weblogic Memory Leak Detector, and so on. Although these tools can identify the individual types of objects that are increasing in number over a period of time, they do not help to identify the composite data structure of which these objects are part. To understand the root cause of a memory leak, it is necessary to identify not only individual leaking objects at a lower granularity, but also what is holding on to the leaking objects, and look at the whole data structure causing the memory leak at a larger granularity. In addition, profiling techniques used in some of these tools add overhead to normal application processing time, which make them unsuitable for production usage. Furthermore, application instrumentation techniques modify application behavior, which may also not be desirable.

Another set of tools includes the HAT tool from SUN Microsystems® which also analyzes Java heap dumps. The HAT tool produces statistics for data types that have large number of instances in a single dump and can also compare two heap dumps to identify data types which have increased in number. Again, what is missing is a description of the leaking data structure.

HeapRoots is an experimental console-based tool for analyzing IBM JDK heap dumps similar to the HAT tool, but it does not pinpoint the root cause of a memory leak. Memory Dump Diagnostic for Java (MDD4J) improves upon basic analysis features provided in HeapRoots by adding comparative and single dump analysis for memory leak root cause detection, and also provides an interactive graphical user interface.

Benefits and limitations

The lack of scalable and low overhead memory leak analysis tools make it hard to deal with memory leak issues in production or stress test environments.

The MDD4J tool is designed to address this gap. By analyzing heap dumps offline in the MDD4J tool running within the IBM Support Assistant process enables resource intensive pattern matching algorithms to be applied to comparatively analyze the dumps to detect root causes of the memory leak. These pattern matching algorithms seek to identify aggregated data structures (grouped together by similarity of ownership structure) that are growing the most in between the memory dumps. This approach not only identifies low level objects experiencing growth but also identifies higher level data structures, of which the various leaking objects are part. This helps to answer the question of what is leaking at a higher level of granularity than ubiquitous low level objects such as Strings.

In addition, the tool also provides footprint analysis, which identifies a summarized set of major contributors to the size of the Java heap, their ownership relationships, and their constituent significant data types. The ownership relationships, along with the browsing capabilities, also help to answer the question of what is holding on to the leaking objects in memory, thus causing the leak. The data types in the ownership context and contents also help to ascertain blame to a particular high level component in the whole memory leaking application. This helps to apportion the responsibility to the correct development team for detailed analysis.

It is also important to point out that the tool only points out suspects which may or may not be actual memory leaks. This is because leaking data structures and valid caches of objects are often indistinguishable.

Another gap is that it is not possible for the tool to identify the source code in the memory leaking application that is causing the memory leak to occur. To provide that information, it is necessary to capture allocation stack traces for every object allocation, which is very expensive and is also not available in most formats of memory dumps.


Conclusion

When used in conjunction with lightweight memory leak detection, the Memory Dump Diagnostic for Java tool provides a complete production system that combines the benefits of early notification within production environments with state of the art offline analysis results.


Appendix: Supported HeapDump formats and JVM versions

The following formats of memory dumps are supported by the Memory Dump Diagnostic for Java tool:

  1. IBM Portable Heap Dump (.phd) format (for WebSphere Application Server Versions 6.x, 5.1 on most platforms)
  2. IBM Text heap dump format (for WebSphere Application Server Versions 5.0 and 4.0 on most platforms)
  3. HPROF heap dump format (for WebSphere Application Server on the Solaris® and HP-UX platforms)
  4. SVC Dumps (WebSphere on the IBM zSeries)

Table 1 provides a list of WebSphere versions, JDK versions, and Java memory dump formats supported on different platforms hosting the WebSphere Application Server JVM.

Table 1. Applicable platforms and versions
PlatformWebSphere versionJDK VersionJava Memory Dump Format

AIX®, Windows®, Linux®

6.1 IBM J9 SDK 1.5Portable Heap Dump (PHD)

IBM J9 SDK 1.5 (64-bit)

Portable Heap Dump (PHD)

6.0.2

IBM J9 SDK 1.4.2 (64-bit)

Portable Heap Dump (PHD)

6.0 - 6.0.2

IBM SDK 1.4.2

Portable Heap Dump (PHD)

5.1

IBM SDK 1.4.1

IBM Text Heap Dump

5.0 - 5.0.2

IBM SDK 1.3

IBM Text Heap Dump

4.0

IBM SDK 1.3

IBM Text Heap Dump

Solaris®, HP®

6.1SUN JDK 1.5HPROF(ASCII)

SUN JDK 1.5 (64-bit)

HPROF(ASCII)

6.0.2

SUN JDK 1.4.2 (64-bit)

HPROF(ASCII)

6.0 - 6.02

SUN JDK 1.4

HPROF(ASCII)

5.1

SUN JDK 1.4

HPROF(ASCII)

5.0 - 5.0.2

SUN JDK 1.3

HPROF(ASCII)

4.0

SUN JDK 1.3

HPROF(ASCII)

z/OS®

6.1IBM J9 SDK 1.5SVC/PHD

IBM J9 SDK 1.5 (64-bit)

SVC/PHD

6.0.2

IBM SDK 1.4.2 (64-bit)

SVC/PHD

6.0 - 6.0.2

IBM SDK 1.4.2

SVC/PHD

5.1

IBM SDK 1.4.1

SVC

5.0-5.02

IBM SDK 1.3

SVC

OS/400

6.1

IBM SDK 1.5

PHD


Acknowlegments

The authors would like to thank Daniel Julin and Stan Cox for reviewing the paper, and Scott Shekerow for editing the contents.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into WebSphere on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=WebSphere
ArticleID=151333
ArticleTitle=Memory leak detection and analysis in WebSphere Application Server: Part 2: Tools and features for leak detection and analysis
publish-date=08022006