IBM Support

Why are some Java objects alive?

How To


Summary

"Why are some Java objects alive?" is one of the most important questions in Java heap analysis. By the end of this article, you should be able to identify all of the objects keeping any suspect objects alive using heap dumps and the free Eclipse Memory Analyzer Tool (MAT).

Steps

This article will demonstrate how to use the Merge Shortest Paths to GC Roots query in the Eclipse Memory Analyzer Tool (MAT) to find the reasons why objects are alive.

One of the main reasons to understand why objects are alive is to find the causes of a Java OutOfMemoryError (OOM). If the objects retaining a large part of the Java heap were not alive, then they would be garbage; thus, they would be eligible for garbage collection and the Java OOM would not have happened. Therefore, the proximate causes of a Java OOM (and potentially, indirectly, of native OOMs) are whatever is keeping objects alive, and once you identify such "root" objects and paths to the live objects, you may ask the owners of those classes or of the classes on the paths from the roots to the suspects for next steps to resolve an OOM (alternatively, if the issue is not a leak of objects, the Java heap size may be increased, thread pool sizes and/or caches may be decreased, and/or workload may be distributed across additional processes, although those options are outside the scope of this article).

Let's first define what it means for an object to be alive (a.k.a. live) or dead (a.k.a. garbage). To define these terms, there's no way around defining references and reachability, so we quickly get into some subtle and slightly confusing concepts; alas, let's dive in.

Starting with the basics, a Java heap is a directed graph of objects and their references:

Basic Java heap

The blue boxes represent live garbage collection roots (a.k.a. GC roots or heap roots). GC roots may only be specific types of objects such as system classloaders and classes, threads, thread stack locals, JNI references, etc. (in general, a GC root has a pointer to it from the native heap). For example, an instance of some String can't be a GC root because the String class is a GC root for all instances of Strings.

At the simplest level, a directed arrow in the picture above represents a reference from one object to another object (the non-directed lines between the GC roots simply represent that the GC roots are part of the same graph). For example, imagine that the class of Object1 is the following:

      public class Person {
          private String firstName;
          private String lastName;
      }

In the above picture of Object1, the firstName and lastName fields have references to instances of java.lang.String (Object1a and Object1b are those Strings, respectively):

At the simplest level, garbage collection starts at the live GC roots and walks all reachable paths to referenced objects and this is called the mark phase. Any objects reachable from live GC roots are marked (as well as the live GC roots themselves). Any unmarked objects that remain are garbage. The sweep phase of garbage collection may sweep away such garbage objects from the heap (not every garbage collection necessarily sweeps away all garbage). In the first picture at the top of this article, GarbageObject1 is not reachable from any of the GC roots and thus it may be swept away by the next garbage collection. Note that some garbage collectors are very sophisticated and mark and sweep subset(s) of the heap at any one time to reduce worst-case garbage collection pause times.

There are different types of reachability: let's start with the simplest one – strongly reachable – which we have been implicitly discussing above:

An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it.

In other words, if none of the objects on the path between a thread GC root and the target object are an instance of a subclass of java.lang.ref.Reference, then an object is strongly reachable.

Now let's discuss the other types of reachability. Why might java.lang.ref.Reference be useful? The simplest example is a java.lang.ref.SoftReference. SoftReferences are great for transient caches because they may be used to keep an object around "softly" – if the JVM really needs the space and an object is only softly reachable, then the JVM may consider it as garbage. For example, let's extend our Person class above to have a SoftReference to a byte array that represents the person's profile picture:

      public class Person {
          private String firstName;
          private String lastName;
          private SoftReference<byte[]> profilePicture = new SoftReference<byte[]>(null);
          public synchronized byte[] getProfilePicture() {
              byte[] pic = profilePicture.get();
              if (pic == null) {
                  pic = DatabaseLoader.loadProfilePicture(firstName, lastName);
                  profilePicture = new SoftReference<byte[]>(pic);
              }
              return pic;
          }
      }

The above design caches the database lookup of a person's profile picture into a SoftReference field when getProfilePicture is called. If there is a subsequent call to getProfilePicture for this instance of Person, and if the Java heap is not under memory pressure, then the database lookup is avoided. At any point in time, the garbage collector may decide to clear the byte array within the SoftReference and then the code will re-do the database lookup. Here's how the object graph might look like if the SoftReference is set:

Another type of java.lang.ref.Reference is a java.lang.ref.PhantomReference. PhantomReferences are just a more flexible form of Java finalization. Java finalization and PhantomReferences may be used to perform processing of an object after it is essentially garbage but before it has been fully collected by the garbage collector. In general, finalization and PhantomReferences are discouraged because the processing of such objects is generally non-deterministic which may lead to native memory leaks, native OOMs, and garbage collection pause time variability (especially with generational garbage collectors, which most modern garbage collectors are, because such objects may build up in the older generations and garbage collection may not process them for a long time).

The final type of java.lang.ref.Reference is a java.lang.ref.WeakReference. WeakReferences are just like SoftReferences, except that WeakReferences are collected eagerly, whereas SoftReferences are not collected eagerly (as long as there is plenty of free Java heap).

Now that we have a clear picture of reachability and GC roots, we can use MAT to analyze a heapdump and find the paths to GC roots that are keeping suspect objects alive.

First, a caveat: This technique works well for IBM/OpenJ9 system dumps (core*.dmp) and HotSpot HPROF dumps, but it often works poorly for IBM/OpenJ9 PHDs (heapdump*.phd). The reason for this is that PHDs have limited information about GC roots. One of the most common types of GC roots is a thread stack frame local, so although you may find your suspect rooted in such a GC root, a PHD won't tell you which thread that root is on, so it may be difficult to understand what that thread was doing. Therefore, in general, and particularly for OOMs, system dumps are preferred over PHDs particularly because they have more accurate GC roots (note that system dumps, like HPROF dumps, include all Java memory content such as String and primitive values, so they should be treated sensitively).

To process IBM/OpenJ9 system dumps and PHDs, Eclipse MAT must be extended with the free IBM DTFJ MAT plugin (in addition, we also recommend installing the free IBM Extensions for Memory Analyzer [IEMA], also at the previous link, which provide various useful IBM product-specific queries and name resolvers).

After the heap dump is loaded in MAT and the suspects are identified, right click on those suspects and click Merge Shortest Paths to GC roots > exclude all phantom/weak/soft/etc. references. In general, we are not interested in phantom, weak, or soft reference paths to suspect objects, so this option excludes such paths. Here is an example result:

  Class Name                                                   | Ref. Objs | Retained 
 ------------------------------------------------------------------------------------
  class sun.launcher.LauncherHelper @ 0x766e4b6f8 System Class |        28 |     848 B
  '- appClass class SimulateJavaOOM @ 0x766e4b7c8              |        28 |   2.61 GB
     '- holder java.util.ArrayList @ 0x766e51c70               |        28 |   2.61 GB
        '- elementData java.lang.Object[33] @ 0x766e51c88      |        28 |   2.61 GB
           |- [0] byte[100000000] @ 0x6c0000000  ..............|         1 |  95.37 MB
           |- [1] byte[100000000] @ 0x6c5f5e110  ..............|         1 |  95.37 MB
           |- [2] byte[100000000] @ 0x6cbebc220  ..............|         1 |  95.37 MB
           [...]
           |- [26] byte[100000000] @ 0x75af8dba0  .............|         1 |  95.37 MB
           |- [27] byte[100000000] @ 0x760eebcb0  .............|         1 |  95.37 MB
  -------------------------------------------------------------------------------------

In this example, we ran this query on the selection of byte arrays. The results of the query will list the GC roots and then you expand the nodes until you reach your selected objects. The way to read this table is the same as in the first picture in this article: The GC root has a reference to the child tree element and so on all the way to the selected objects. In this example, the shortest path to any GC root for these byte arrays starts at the static LauncherHelper class which is a GC root. This class has a field called appClass which is a class of type SimulateJavaOOM. This class has a static field called holder which is an ArrayList. This ArrayList has a field called elementData which is an Object[]. This Object[] has references to all of the byte arrays.

In this example, what we have demonstrated is that the shortest path to a GC root for the selected objects shows that the byte arrays are held within static class fields. This is a common sign of some sort of cache (because a static object is available for multiple threads). In particular, the SimulateJavaOOM class is kept alive because of its nature: it's a class that has been loaded by the LauncherHelper system class. There is effectively no way to unload this class because it is not within a customized application classloader, so the SimulateJavaOOM class will be alive for the duration of the process. Therefore, the suspect is clearly the static ArrayList field called holder which is the ultimate reason why the byte arrays are held.

It is not necessary to always understand the path to GC root in such detail; however, the summary is that you may provide the above output to all of the owners of the classes within that path, and those owners should be able to identify why these strong references exist and what next steps to take.

A very important column in this output is the "Ref. Objects" column. Since we are finding the shortest paths to GC roots for multiple objects, MAT will combine these paths where possible and note along the way how many objects are in the path at each level. In the above example, we can see that all 28 byte arrays are in the same shortest path to a GC root down to the Object[]. If these suspects were in a more complicated object graph, you may find multiple GC roots and/or multiple paths down from the GC roots. The largest number of objects is not necessarily the path that retains the most heap. If you selected a set of objects with widely different retained heaps, then you may better understand which paths are most important by expanding the tree as completely as possible and then clicking on the calculator icon and selecting Calculate Minimum Retained Size (quick approx.) which adds another column to the output:


It's important to note that this query only shows the shortest path to any GC root and this may not always be the most interesting or meaningful path. If you would like to explore all paths to GC roots for a single object, use the other Path to GC Roots query with the same exclude filter and click on the Fetch Next Paths button in the top right to continue fetching additional paths. This alternative query will have an inverted view as compared to the merge query above: it will start at the object and it will show incoming references to that object through all of the GC root paths and the leaf nodes in the tree will be the GC roots.

Let's take another example of objects that are rooted in a thread:

  Class Name                                          | Ref. Objects | Retained Heap
  -----------------------------------------------------------------------------------
  java.lang.Thread @ 0xec81f930  main Thread          |            5 |     500.00 MB 
  '- <Java Local> java.util.ArrayList @ 0xec81f9c8    |            5 |     500.00 MB 
     '- elementData java.lang.Object[10] @ 0xf2c64848 |            5 |     500.00 MB
        |- [3] byte[104857600] @ 0xf2c64a60  .........|            1 |     100.00 MB
        |- [4] byte[104857600] @ 0xf9064ab0  .........|            1 |     100.00 MB
        |- [2] byte[104857600] @ 0xec864790  .........|            1 |     100.00 MB
        |- [1] byte[104857600] @ 0xe6416300  .........|            1 |     100.00 MB
        |- [0] byte[104857600] @ 0xe0016230  .........|            1 |     100.00 MB
  -----------------------------------------------------------------------------------
Objects with field references

Document Location

Worldwide

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}},{"Business Unit":{"code":null,"label":null},"Product":{"code":"SG9NGS","label":"IBM Java"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
13 July 2021

UID

ibm11074993