In each column, The Support Authority discusses resources, tools, and other elements of IBM® Technical Support that are available for WebSphere® products, plus techniques and new ideas that can further enhance your IBM support experience.
Combining tools and experience
Problem determination and performance tuning of production level multi-tier enterprise Java™ applications is considered to be an extremely difficult task. Fortunately, there are many diagnostic tools available that can be used to detect the root cause of such problems. In this article, we share our experience detecting and diagnosing performance problems in production and open source benchmark applications through the presentation of three case studies based on actual situations. These case studies cover a wide spectrum of performance issues:
- Unavailability due to a memory leak in infrastructure code
- High system CPU due to bad developer code
- High CPU and memory usage caused by an application design issue that became aggravated by incorrect runtime policies and tuning.
Each case presents the problem and the way we solved it through analysis of verbose garbage collection (GC) logs and heapdumps using IBM tools such as Garbage Collection and Memory Visualizer (GCMV) and IBM Monitoring and Diagnostic Tools for Java - Memory Analyzer (MAT). Each of these case studies highlights the complexity associated with problem determination and performance analysis of enterprise Java applications. All the tools described are available as plugins in the IBM Support Assistant workbench.
Case study 1: Memory leak in infrastructure code
Our ﬁrst case study involves an open source benchmark called RUBiS. RUBiS mimics an auction site. It comes with its own workload driver and has a web tier that connects to a database tier. It has been cited heavily in research papers for performance evaluation (see Resources).
While using RUBiS as a benchmark for our other research, we noticed that RUBiS would stop responding at the end of an experiment and had to be restarted for the next experiment. We were using the servlets version of RUBiS and MySQL JDBC driver. We analyzed the verbose GC log of the application using GCMV for a constant load of 100 clients. Verbose GC logs can be easily enabled either through the Java command line or through the management user interface of most commercial application servers. It revealed that the used heap kept growing all the time, even at constant load (Figure 1). This seemed like a classic memory leak issue. However, as we discovered, the root cause of this memory leak was not as obvious as most classic memory leaks.
Figure 1. GCMV: Increasing RUBiS heap usage at constant load
For memory leak analysis, it is a good idea to understand some basic definitions first:
- Shallow size of an object is the amount of memory allocated to store the object itself, not taking into account the referenced objects.
- Retained size of an object is its shallow size plus the shallow sizes of the objects that are accessible, directly or indirectly, only from this object.
- Retained set of X is the set of objects that will be garbage collected if X is garbage collected.
- Dominator tree is a transformation of the cyclic object graph into a Keep-Alive tree where every node in the tree is directly responsible for keeping alive its children.
We took two heap dumps of this application within minutes of each other at constant load and compared them, first using MAT. Heap dumps can be easily enabled in most commercial application servers. A quick look at the count of objects in those two heap dumps pointed us to a set of objects that had grown almost twice their original count and size. Most of these objects were of primitive types, but a few that had grown the most were related to PreparedStatement (number of such objects almost doubled between the first and the second heapdump) calls in JDBC. A quick breakdown of the objects that grew the most is given in Figure 2. In most memory leaks, there are a few container objects whose size grows the most and then there are other object types whose count grows the most.
Figure 2. MAT: RUBiS objects with most growth between the two heapdumps
From the snapshot in Figure 2, you can see that the count of objects of type PreparedStatement increased the most. However, we still needed to find where in the code this count was increasing (in other words, where is this memory leak located in the code). One way to do this is to look at the dominator tree in MAT (Figure 3). You can clearly see that two servlets (ViewBidHistory and BrowseCategories) account for more than 50% of the used heap, and these two classes are likely to contain the code that is leaking memory.
Figure 3. MAT: Dominator tree with two servlets that account for more than 50% of the used heap
Because we had the source code of the application with us, we searched for the usage of PreparedStatement objects in the two servlets (ViewBidHistory and BrowseCategories) and we found the pattern shown in Figure 4. Recall that a PreparedStatement object represents a precompiled SQL statement with IN parameters that can be set each time with the speciﬁed setter methods. This way, it can be reused to execute the SQL statement efficiently multiple times. However, RUBiS code repeatedly created a lot of “temporary” PreparedStatement objects inside a loop, but closed only the last instance that was created. The JDBC speciﬁcation states that closing a JDBC driver should close all PreparedStatements associated with a Connection when the Connection is closed. In this case, the JDBC driver did not close the PreparedStatement objects at Connection close and maintained a reference resulting in those objects not getting garbage collected, and so their number kept increasing, which resulted in a memory leak.
There are two ways to ﬁx this kind of a leak: either the statement close method is moved inside the loop, or the statement creation is moved above the loop. Both will ﬁx the memory leak, but the latter is likely to give better performance. We found that 7 out of 22 servlets in RUBiS were leaking memory at various places due to this usage patter. The resulting heap usage graph in GCMV at constant load after ﬁxing the memory leak is given in Figure 5 . Notice that the used heap remains almost constant as expected.
Figure 4. RUBiS code that was the root cause for the memory leak
This case study highlighted a memory leak scenario in infrastructure code where the leak happened due to a combination of bad usage as well as bad driver implementation. Furthermore, the root cause turned out to be something that is not usually perceived as a common cause of a leak.
Figure 5. GCMV: Constant RUBiS heap usage after fixing memory leak at constant load
Case Study 2: High system CPU usage
Our second case study is from a telecom application in production (referred to hereafter as TelecomApp). This application served as the home page for millions of mobile phone users. Users could click and get a variety of multimedia content such as ring tones, wallpapers, live scores, and so on. In production, the main components of this application consist of a cluster of web portal nodes that connect to a cluster of back end content management system nodes, which in turn get their data from a database running on a large multicore machine.
The problem that was reported was of very high CPU utilization at low to moderate number of users. We collected detailed system logs using nmon. A quick look at the CPU logs revealed that the webportal node had high levels of system CPU (Figure 6).
Figure 6. Web portal node CPU graph from nmon before fixing the code
Verbose GC logging was enabled through the WebSphere Application Server admin console and we analyzed the verbose GC logs collected (by default, they go in native_stderr.log in WebSphere Application Server) using GCMV. The GC logs looked ﬁne, except a very high number of objects being queued for ﬁnalization -- close to 40000 of them (see the dotted line in Figure 7). Recall that all those objects that have a ﬁnalize() method and are found to be unreachable (dead) by the garbage collector are pushed into a ﬁnalization queue. At some point later, the ﬁnalizer thread will dequeue this object from the ﬁnalization queue and call its ﬁnalize() method. Too many objects being queued for ﬁnalization can cause two types of performance degradation:
- High CPU utilization, because of the ﬁnalizer thread’s work, or
- High memory utilization, as all the objects that are dead but reachable from an object that is yet to be ﬁnalized cannot be reclaimed.
Figure 7. GCMV: Objects queued for finalization (dotted line = original code, solid line = after fixing code)
Figure 8. MAT: Histogram of objects referenced by finalizer in finalizer reference statistics
We enabled heap dump settings in WebSphere Application Server and took a heap dump in the staging environment to ﬁnd out the type of objects that were being ﬁnalized. We used MAT to analyze the heapdump and looked at the histogram of objects referenced by finalizer in finalizer reference statistics (Figure 9). Out of 38,469 objects queued for finalization, 35,626 were of type java.util.zip.ZipFile. We took thread dumps and analyzed them using IBM Thread and Monitor Dump Analyzer (also shipped with IBM Support Assistant) that pointed us to code that was responsible for creation of these objects of type java.util.zip.ZipFile (Figure 10).
Figure 9. IBM Thread and Monitor Dump Analyzer: Code responsible for creating java.util.zip.ZipFile
The code invoked the getResource(Filename) method on the class loader to get handle to the exact path of a particular properties ﬁle. This method, essentially, unzipped all JAR ﬁles in the classpath, one by one, to search for the given ﬁlename. This resulted in the creation of a lot of java.util.zip.ZipFile objects (each of which have a ﬁnalize method) and a lot of disk read activity. This unzipping of JAR ﬁles and the associated disk read activity was the reason behind the observed large system CPU. This code was repeated in several methods, all of which were on the critical path of various client requests. Steps taken to detect the root cause for this problem are shown in Figure 10.
Figure 10. Correlating output from various tools to resolve TelecomApp high system CPU problem
We ﬁxed this problem by replacing the code that searched for this JAR ﬁle in the classpath, by the exact ﬁle location of the given ﬁle. Immediately, the system CPU went down (Figure 11). The number of objects queued for ﬁnalization went down as well (see the solid line in Figure 7). Notice the initial spike in the objects queued for ﬁnalization in both cases; this happens as a result of all the classes being loaded from various JARs at startup.
This case study highlighted a high system CPU usage scenario where the problem happened due to bad code. This case study also highlighted how output from different tools can be correlated to get to the root cause of a performance problem.
Figure 11. Web portal node CPU graph from nmon after fixing the code
Case Study 3: High CPU and memory usage
Our third case study is from the same TelecomApp form the previous case study. In this case study, we tried to solve the problem of high memory usage by the application coupled with high CPU utilization. This resulted in the application running out of heap memory after running for two or three days, and halting with a heap dump getting generated. The application used the default optthruput GC policy in the IBM JDK, which is aimed at optimal throughput.
First, we analyzed the production verbose GC log using GCMV and looked for the used heap. As shown in Figure 12, the used heap curve (pink line) shows the heap always has about 750 MB of data. The used heap becomes the lowest at night (between 1:30 am and 6:00 am). Logs also had a very high number of mark stack overﬂow errors; this happens when there are too many live objects in the heap (or, more precisely, very deeply nested objects) and the stack that GC uses during the mark phase overﬂows. Figure 12 also shows two restarts indicated by the sudden drop in heap size.
Figure 12. GCMV: Consistently high used heap (750 MB) in the TelecomApp
The verbose GC log also showed high pause times due to high compaction times. This pattern was very uniform thoughout and almost all high pause times were caused due to high compaction times (Figure 13). Further analysis of the logs for these high pause times revealed allocation failures with reason code 16; reason code 16 compaction happens when the garbage collector is not able to increase the amount of free storage by at least 10% . The overhead due to garbage collection was around 13% (GC overhead=times spent in GC pauses/total execution time * 100). Figure 13 also shows that GC keeps freeing memory; however, the freed memory (green curve) keeps decreasing due to fragmentation, and eventually compaction is required (red lines). Notice that heap fragmentation can also lead to dark matter which cannot be used for satisfying allocation requests. Any heap space that lies between two allocated objects and is less than 512 bytes, is not used by the JVM for allocation requests and is referred to as "dark matter."
Figure 13. TelecomApp: GC freed memory (green curve) keeps decreasing due to fragmentation and eventually compaction is required (red lines)
An analysis of the production heap (that got dumped on an out of memory error) showed a total of 1.2 GB of memory being used, out of which close to 585 MB of the retained heap is occupied by objects belonging to two classes (Figure 14):
Notice that the retained size of an object is its shallow size plus the shallow sizes of the objects that are accessible, directly or indirectly, only from this object. Table 1 shows the breakdown of the shallow heap by object types. Recollect that shallow size of an object is the amount of memory allocated to store the object itself, not taking into account the referenced objects. You can see that the top three contributors are all related to HashMaps: java.util.HashMap$Entry and java.util.HashMap$Entry contribute nearly 390 MB (close to 33% of total heap size). This hints that both com.TelecomApp.cache.impl.CacheEntryImpl and com.TelecomApp.mcs.devices.InternalDeviceImpl are comprised of large hashmaps.
The application development team reconﬁrmed that the application uses a lot of caches internally to cache mobile device speciﬁc policies for faster response time. This explained why the heap usage never goes down drastically. This seemed like an application design issue, since the memory footprint of the application seemed to be more than what the system could provide. However, the application team maintained that cached entries should automatically expire every 30 minutes or so and they ruled out a redesign since it would have required considerable effort.
Figure 14. TelecomApp production heap dump break up (retained heap)
Table 1. TelcomApp production heap dump break up (shallow heap)
|1,871,075||305,453,504||array of char|
|927,421||114,439,696||array of java/util/HashMap$Entry|
|639,453||52,688,552||array of java/lang/Object|
|92,922||49,771,728||array of byte|
|109,740||27,300,328||array of int|
|26,622||16,834,352||array of com/TelecomApp/mcs/themes/StyleValue|
|251,689||11,886,160||array of com/TelecomApp/mcs/themes/PropertyValue|
|340,836||9,177,496||array of long|
|95,246||8,393,000||array of java/util/Hashtable$Entry|
|281,817||7,144,296||array of com/TelecomApp/styling/impl/engine/matchers/SimpleMatcher|
|175||5,708,528||array of org/apache/xpath/objects/Xobject|
Verbose GC logs also pointed out that the application was using some large objects. We analyzed the heap dump for the shallow object sizes. There were only 30,000 objects greater than 1024 bytes in a total of close to 25 million objects. This constitutes almost 0.12% of the heap in terms of number of objects. However, the cumulative space occupied by these large objects was roughly 222 MB out of the total 1.2 GB (18.5%). The average object size was 51 bytes, while the median size was 32 bytes, which was also the mode. This indicated that the heap was comprised of a very large number of small objects which were interspersed throughout the heap in between few large objects. A histogram of the shallow objects is given in Figure 15.
Figure 15. Histogram of the object sizes from TelecomApp production heap
In order to rule out any memory leaks, a long 10 hour run at low and constant load was conducted in the staging environment. The heap usage from the verbose GC log was almost constant throughout the experiment, which ruled out any memory leaks. However, the staging environment could not replicate the long pause times observed in the production system. This was probably due to the load that was oﬀered to the system.
Our analysis so far ruled out any memory leaks and the following characteristics became apparent:
- The heap was fragmented and resulted in high compaction times and long periods of sustained high CPU usage, as well as wasted heap space.
- The application had a mix of a few large and a lot of small objects.
- The application had some short-lived objects and most of the heap seemed to comprise of relatively long-living cached objects.
Garbage collection techniques to resolve the above issues have been discussed in literature. Given the above application characteristics, we considered two solutions:
- Use a generational and concurrent collector. This policy (known as the Gencon GC policy in IBM JDK) is usually recommended for transactional applications that create a lot of short-lived objects. This policy divides the heap space into nursery (for new and short lived objects) and tenured space for old objects. Since this policy uses copying of live objects in the nursery space during a collection in the nursery (minor collection), it gets rid of fragmentation in the nursery. However, when the tenured space ﬁlls up, a global collection occurs and unreachable objects are collected, which can result in fragmentation in the tenured space. Ideally, we would like to move all the long-lived cached objects into the tenured space and leave the nursery only for short lived objects.
- Use a pool of objects for the most frequently created objects. This resolves the problem of fragmentation by reducing creation and collection of short lived objects. However, this requires either creating a pool for frequently created objects in the application code or using the inbuilt pooling capability in the IBM JVM.
Figure 16. TelecomApp with modified GC policy (Gencon)
We tried both solutions, but realized that the ﬁrst option was easier to
implement on a production system. We ﬁrst tried the generational and
concurrent collector with default settings. Garbage collection policy can
be easily specified on the Java command line for WebSphere Application
Server (using the flag
specified generational policy by using the flag
–Xgcpolicy:gencon. Garbage collection overhead
(percentage of time spent in GC pauses) went down from 13% to 1.18%. In
the default gencon setting, the size of the nursery heap is set to 55-60
MB and the rest is used as the tenured space. When this setting was put
into production, the high pause times went away as there was no compaction
required (Figure 16). However, a narrow saw-tooth pattern in the used
memory and free tenured space (Figure 16) also indicated that the default
nursery size was too small. This resulted in too many minor collections,
and objects being prematurely promoted into tenured space. This also
resulted in the tenured space ﬁlling up very soon and requiring a global
The saw-tooth pattern revealed that around 400-550 MB of memory was being freed from the tenured space in each global collection. We used this as a yardstick to set the initial size of nursery heap to be 256 MB and the maximum nursery size to be 512 MB. The initial and maximum nursery sizes can be easily controlled using the flags –Xmns and –Xmnx, respectively. When this setting was implemented in production, the GC overhead further went down from 1.18% in the default Gencon settings to 0.8% (Table 2). The problem of high CPU usage for a sustained period of time (which was being caused by frequent fragmentation and compaction in the default settings) was also resolved.
Table 2. TelecomApp: performance improvement through GC tuning
|GC policy||Time spent in GC pauses||Max pause time in seconds||Mean pause time in seconds|
|Optthroughput GC policy|
|Gencon GC policy|
(default nursery size)
|Gencon GC policy|
(increased nursery size)
This case study highlighted a high CPU usage scenario where the problem happened due to a combination of application design and infrastructure settings (garbage collection policy settings). The application suﬀered high GC overheads due to heap fragmentation and repeated compaction, and this resulted in high CPU utilization.
This case study also illustrated the complexity associated with performance tuning: optimally tuning GC settings for an enterprise application that had an interesting mix of objects (large and small objects, long-living and short-living objects) can be non-trivial and difficult to automate.
In this article, we presented some of our experiences working with enterprise Java applications that exhibited various performance and scaling problems. We shared the insights we learned while determining the root cause of some of these performance problems. We diagnosed an availability problem with an open source benchmark, and found the root cause to be a memory leak that occurred due to usage error, as well as a non-compliant JDBC driver implementation. We presented two other case studies from a production telecom application. One resulted in high CPU utilization caused by code that searched a properties file in the entire classpath. This manifested itself as high number of finalization objects that get created when various JAR files are unzipped. The last case demonstrated various garbage collection issues that enterprise applications in Java face. Through our analysis we were able to recommend the correct GC policy and tune its settings.
Throughout this journey of fixing performance problems, the analysis of verbose GC logs and heap dumps using the various tools available in IBM Support Assistant proved to be valuable diagnostic tools that provided us with vital clues on the probable root causes. Our experience also demonstrated that the underlying causes of these problems tend to be diverse, making it is very difficult to provide an end-to-end root cause analysis engine for these kinds of problems.
The authors thank Rajeev Palanki and Chris Bailey for their valuable suggestions and guidance.
- RUBiS Homepage
- Garbage Collection and Memory Visusalizer (GCMV)
- Memory Analyzer Tool (MAT)
- IBM Support Assistant (ISA)
- nmon performance
- Java Diagnostics Guide 1.4.2 - verbose GC output from a compaction
- Mash that trash – Incremental compaction in the IBM JDK Garbage Collector
- Retained and Shallow Heap
- Heap fragmentation with IBM 1.3.1 and 1.4.2 JVMs
- Enabling Verbose GC logging in WebSphere Application Server
- Enabling dumps in WebSphere Application Server
- Fix Central
- IBM Support Portal
- The Support Authority: If you need help with WebSphere products, there are many ways to get it
- IBM Software product Information Centers
- IBM Software Support Web site
- IBM Education Assistant
- IBM developerWorks
- IBM Redbooks
- WebSphere Software Accelerated Value Program
Get products and technologies
- WebSphere and CICS Support Blog
- Forums and newsgroups
- Java technology Forums
- WebSphere Support Technical Exchange on Facebook
- Global WebSphere Community on WebSphere.org
- Follow IBM Support on Twitter!