Java technology, IBM style, Part 2
Garbage collection policies
Use verbose GC and application metrics to optimize garbage collection
This content is part # of # in the series: Java technology, IBM style, Part 2
This content is part of the series:Java technology, IBM style, Part 2
Stay tuned for additional content in this series.
Part 1 of this introduction to GC policies detailed the different policies available in the latest version of the IBM Java runtime environment (JRE):
optthruput: Optimizes for throughput. This is the default policy.
optavgpause: Optimizes for the average GC pause.
gencon: Uses a generational concurrent style of collection.
subpool: Speeds up object allocation on systems with very large numbers of processors. This policy is only available on IBM pSeries® and zSeries® platforms, and we won't be discussing it further in this article.
In this article, we describe how to collect the information needed to choose a particular policy. Most of the time, the default policy works very well, and no change is necessary. However, the different policies have different characteristics, and for some applications, another policy works better. Before evaluating the different GC policies, though, you must decide on the performance characteristics that are important for a particular application.
Garbage collection and performance
When choosing a GC policy, it is important for you to consider that garbage collection can affect performance in several ways, both positive and negative. All GC policies, even concurrent ones, involve stopping application threads for some period of time. These pauses can cause a momentary lack of application responsiveness and reduce the time available to the application to do work. However, the right GC policy can also deliver significant performance advantages to an application.
Garbage collection can enhance application performance by improving object locality and the speed at which new objects can be allocated. Garbage collectors that rearrange objects in the heap by compaction or by performing a copying collection can move objects that access one another close to each other in the heap, and this can have a dramatic effect on the rate of data processing. Garbage collectors that group objects that are accessed at the same time together have a positive effect on application performance. Compacting and copying collectors can also increase the rate at which new objects can be allocated by keeping memory unfragmented. This eliminates the need to search through the available memory looking for a slot big enough to hold new allocations.
One obvious way to optimize application performance is to choose a GC
policy that minimizes any negative impact of garbage collection pauses. A
second, less obvious, way is to choose a policy that maximizes the
benefits of garbage collection. For example, if many short-lived objects
are being created, choosing the
to take advantage of its speedy allocation may give you the best
Think about throughput, response times, and pause times
Before you can get the best performance out of an application, you need to think about what kind of performance characteristics you want. The performance of applications can be described in terms of two distinct properties: throughput and response time. Throughput represents the amount of data being processed by the system, and response time is the time taken by the application to process requests, from the receipt of the request to the completion of processing. For example, a Web application might be able to handle 1,000 requests per second (throughput) and turn each request around in 2 seconds (response time).
For some applications, only one of these performance characteristics is important. An interactive application like a text editor generally doesn't need to process large volumes of data, but it is important that it be responsive to user keystrokes. An overnight batch processing application, on the other hand, need not have quick response times, but its ability to process large volumes of data efficiently is critical. However, for most applications, the ideal performance is achieved when response times are low and throughput is high.
An application responds to some requests quickly and other requests more slowly, depending on what else is happening when the request comes in. For example, if garbage collection is underway or if there are lots of requests in the incoming queue, requests may take longer to be processed. It is important to think about both the mean response time and the maximum response time when deciding what performance characteristics are important to a given application. For example, if an interactive application usually responds to user requests in a few milliseconds (low mean response time) but occasionally pauses for ten seconds (high maximum response time), users are unlikely to be satisfied with the performance of the application.
It is important to note that GC pause times are not the same thing as application response times. Short pause times do not guarantee speedy application response times, and long pauses are also not necessarily going to cause high application response times. In very lightly loaded systems, pause times are an important component of response times, but the pause times become less and less important as the system becomes heavily loaded. Pause times are also more important when the heap is rarely collected than when it is frequently collected.
It is not possible to work out application response times by inspecting the verbose GC logs. Mean response times are quite closely related to application throughput and not highly correlated to GC pause times. The maximum response time is always longer than the longest GC pause, unless the application load is exceptionally low. However, in most applications, GC pauses make only a small contribution to the maximum pause time, as you'll see in our first case study.
Using the verbose GC logs
Verbose GC logs give insight into the operation of the garbage collector
and can give hints about policy and parameter choices. The latest version
of the IBM Developer Kit provides very detailed logs in an XML format.
These logs show what the garbage collector has been doing, in great
detail. The verbose GC log includes heap sizes, information about the
division of the heap in the
pause timings, object finalizations, and references cleared. You can
enable verbose GC using one of two command-line options: either
Why look at verbose GC logs?
There are several reasons why you might want to enable verbose GC and inspect the logs. Your application might be experiencing long pauses or periods of unresponsiveness, and you might want to identify or rule out garbage collection as the cause of the pauses. Alternatively, if you're tuning your application to maximize performance, there might be some clues in the verbose GC about changes that could improve performance.
An example of verbose GC
Listing 1 shows an example section of verbose GC for one collection in the
Listing 1. Sample verbose GC for one collection
<af type="tenured" id="1" timestamp="Sun Mar 12 19:12:55 2006" intervalms="0.000"> <minimum requested_bytes="16" /> <time exclusiveaccessms="0.025" /> <tenured freebytes="23592960" totalbytes="471859200" percent="5" > <soa freebytes="0" totalbytes="448266240" percent="0" /> <loa freebytes="23592960" totalbytes="23592960" percent="100" /> </tenured> <gc type="global" id="3" totalid="3" intervalms="11620.259"> <refs_cleared soft="0" weak="72" phantom="0" /> <finalization objectsqueued="9" /> <timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" /> <tenured freebytes="409273392" totalbytes="471859200" percent="86" > <soa freebytes="385680432" totalbytes="448266240" percent="86" /> <loa freebytes="23592960" totalbytes="23592960" percent="100" /> </tenured> </gc> <tenured freebytes="409272720" totalbytes="471859200" percent="86" > <soa freebytes="385679760" totalbytes="448266240" percent="86" /> <loa freebytes="23592960" totalbytes="23592960" percent="100" /> </tenured> <time totalms="83.227" /> </af>
<af> element shows what
triggered the collection in this case: an allocation failure. Other
<con> for a concurrent
<sys> for a collection
System.gc(). Concurrent collections
only occur in the
gencon policies. Forced garbage collections are
not recommended, and so if
are present in the verbose GC, consider rewriting the application to avoid
interfering with the garbage collection routines or disabling explicit
invocation of garbage collection with the
-Xdisableexplicitgc command-line parameter.
The elements most likely to be of interest are the three copies of the
<tenured> element describing the
occupancy of the heap:
<tenured freebytes="23592960" totalbytes="471859200" percent="5" > <soa freebytes="0" totalbytes="448266240" percent="0" /> <loa freebytes="23592960" totalbytes="23592960" percent="100" /> </tenured>
There are three copies of these elements, to show the state of heap at
three important points in time. The first copy shows the state of the heap
before collection. The second copy, nested within the
<gc> element, represents the heap after
collection. This shows the amount of live data in the application at the
time of the collection. The final copy shows the amount of heap available
after the request that triggered the allocation was satisfied. The
difference between this and the amount of free space available immediately
after the collection may be more than the actual amount requested. The
memory manager allocates memory to threads in chunks to minimize
contention on the heap lock.
If the occupancy is consistently high even after the heap has been
collected, the maximum heap may be too small. It can be enlarged with the
-Xmx command-line option.
<soa> elements describe the heap used by
the large and small object areas. The large object area is a small area of
the heap reserved for large object allocations, while the small object
area is the "normal" heap. All objects are initially allocated to the
small object area, but if the small area is full, objects larger than 64KB
are allocated to the large object area. If the large object area is not
required by the application (that is, if the application does not allocate
any large objects), the memory management routine quickly shrinks the
large object area down to nothing so that the whole heap is available for
The following line shows the size of the allocation request that triggered the allocation failure:
<minimum requested_bytes="16" />
If the amount of free space available in the small object area is much
greater than the size of the allocation request and yet the request can
not be satisfied, the heap is fragmented. Consider starting with a very
small heap or using the
gencon policy. Small
initial heaps reduce fragmentation by encouraging the collector to perform
compactions early, when the heap is small and the cost of compaction is
gencon policy avoids fragmentation
by using a copying collector that efficiently compacts the nursery area as
a side-effect of every collection.
Logs collected using the
gencon policy also have
a line for the nursery. The nursery is the area of the heap that
holds all recently allocated objects.
<nursery freebytes="0" totalbytes="33554432" percent="0" />
If the memory requirements of your application are relatively stable, consider setting the minimum and maximum heap sizes to be equal to one another. This can improve application performance by avoiding the garbage collector unnecessarily shrinking the heap. Use the heap occupancies in the log to determine how big to make the heap. The fixed heap must be at least as big as the maximum memory usage of the application to avoid out-of-memory errors. If the heap is close to the memory usage of the application, the garbage collector has to collect the heap too often, and performance suffers.
There is no magic figure for the ideal heap occupancy, but a good initial
rule of thumb is to aim for a heap about one and a half times as large as
the maximum memory requirements of the system. In general, the bigger the
heap, the better the application performance, although there are
exceptions with certain systems or certain kinds of workloads. Shrinking
the heap can also be disabled by adding the command-line option
-Xmaxf=1. As discussed above, starting with a
very small heap and then allowing the garbage collector to expand can
reduce fragmentation and improve performance, so a certain amount of trial
and error is needed to decide whether the default setting, fixed heaps,
disabled shrinking, or small initial heaps are better for your
gencon policy works best when its main
assumption -- that the majority of recently allocated objects do not
survive many garbage collections -- holds. If the nursery is very full
even after collection, too many objects are probably surviving
collections. Consider tuning the nursery size up or down so that
collections happen when there are fewer live objects in the nursery, or
consider changing to another GC policy.
References and finalizers
Weak, soft, and phantom references can improve the memory characteristics of your application by allowing flexible caching. However, like many things, it's best to use them in moderation. If the collector has to clear thousands of references every collection, pause times suffer. Consider whether your application really needs such a large number of references.
<refs_cleared soft="4" weak="28" phantom="0" />
Unlike references, which are best used in moderation, finalizers have essentially no use in a well-written application, with the exception of tidying up native resources. If you see objects queued for finalization in the verbose GC logs, try to rewrite the application to eliminate the use of finalizers.
<finalization objectsqueued="5" />
Also likely to be of interest in the GC log is the time information. For
global collections, the breakdown of time taken for each of the mark,
sweep, and compact phases is given as a nested element in the
<timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" />
Compactions are rare events, and so the compact time is almost always zero.
If many compactions are occurring, consider expanding the maximum heap
size so that there is more headroom in the heap, shrinking the minimum
heap size so that the cost of early compactions is reduced, or switching
gencon policy. The sweep time should be
significantly shorter than the mark times. If the sweep times are very
long, consider reducing the size of the heap or switching to the
The very last element of each collection element is a
<time> element, which records how much
time was taken by the garbage collection:
<time totalms="83.227" >
However, this bears less relation to application performance, and even to application response times, than you might initially think. Application load, garbage collection concurrency, locality, and allocation efficiencies can all influence application performance. The first case study illustrates how pause times, throughput, and response times are related in a typical application.
Case study 1: Know what you want
In this case study, we examine how the choice of garbage collection policy affects application performance with a fairly typical three-tiered client-server application.
The number of active threads pushing work into the application is increased
every two minutes. Each thread has associated data; so, as the number of
threads increases, the memory consumption of the application also
increases. The application was run on a computer with four processors, and
the number of active threads was varied from one to eight. Once there are
more threads than processors, the threads must compete with one another
for processor time, and some threads will always be waiting, so the
application effectiveness is reduced. This is reflected in lowered
throughput, higher mean response times, and much higher maximum response
times. Figures 1, 2, 3, and 4 show graphs of the
pause times, throughput, and response times when the application is run
optavgpause policies. The halfway point of the
graphs is the point at which the system becomes overloaded.
Figure 1 shows the application pause times for each garbage collection,
using the times given in the
<timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" />
element of the verbose GC log. As expected, the
optthruput pauses (blue line) are much longer
optavgpause pauses (green line). The
optthruput pauses get longer as the amount of
live data in the heap increases, while the
optavgpause pauses remain more constant.
Figure 1. The pause times
Figure 2 shows the measured throughput for the application. Again, as
expected, there is a trade-off between optimizing for pauses and
optimizing for throughput: The throughput is slightly higher in the
optthruput policy (blue line) than it is in the
optavgpause policy (green line), no matter how
heavy the application load. Therefore, if throughput is your most
important consideration, you should use the
optthruput policy instead of the
optavgpause policy. The
optthruput policy has better throughput than
optavgpause policy for any application, for
the reasons described in Part 2 of this series (see Related topics for a link).
Figure 2. Application throughput
Mean response times follow the same trend as the throughput in this case,
with the mean response times always being better with the
optthruput policy, as shown in Figure 3. The
blue line is the
optthruput policy, and the
green line is the
Therefore, if mean response times are the most important concern, the
optthruput policy should be used instead of the
optavgpause policy, even though mean pauses are
smaller with the
Figure 3. Mean response times
Finally, Figure 4 shows the maximum response times for both policies. The
blue line is the
optthruput policy, and the
green line is the
optavgpause policy. When the
system is underloaded, the maximum response times are better with the
optavgpause policy. However, once the system
becomes overloaded, the
optthruput policy gives
quicker response times. Notice that the maximum response times are close
to the pause times when the system is underloaded. The garbage collection
pause times are the most significant delays any threads encounter. In this
optavgpause policy gives the
best maximum response times. When the system is overloaded, on the other
hand, threads need to wait for other threads as well as for the garbage
collector, and the maximum response times increase sharply for both
policies. Because the throughput is higher with the
optthruput policy, threads see any threads
ahead of them in the system get out of the way and release control more
quickly, and so the maximum response time is better.
If the system is very heavily loaded or if the heap is smaller so that
collections happen more frequently, the
optthruput policy may not give better maximum
response times. If garbage collections are running very frequently, a
particularly unfortunate thread that gets stuck behind another thread in a
queue may also have to wait for several garbage collections before it gets
to the head of the queue. This would give a very long maximum pause in the
optthruput policy and a shorter maximum pause
Figure 4. Maximum response times
Tables 1 and 2 summarize the pause times, response times, and throughput for the two policies, for the lightly loaded and overloaded cases:
Table 1. Lightly loaded
|Pause time||12 ms||101 ms|
|Maximum response time||35 ms||100 ms|
|Mean response time||0.069 ms||0.063 ms|
|Throughput||22,300 transactions/second||24,600 transactions/second|
Table 2. Heavily loaded
|Pause time||14 ms||155 ms|
|Maximum response time||724 ms||593 ms|
|Mean response time||0.13 ms||0.11 ms|
|Throughput||28,000 transactions/second||31,900 transactions/second|
As the tables show, if throughput is your most important concern, you
should use the
optthruput policy. If response
time is more important than throughput, the choice is less obvious. When
the application is lightly loaded, the best response times are achieved
when garbage collection pauses are minimized with the
optavgpause policy. However, if the number of
threads is greater than the number of processors, the application has been
overloaded, and the
optthruput policy gives the
best maximum response times in such a case. (Other applications will have
different criteria for when they become overloaded.)
In any system where work is queuing up, the response time of the system is likely to be dominated by the queueing time, rather than any garbage collection pauses. Many other factors, including I/O pauses, database wait times, Web service response times, network delays, and any other external interaction may also contribute to the response time. For this reason, pause times are not necessarily a good indicator of the expected response times. The degree to which the pause times will affect response times is also determined by the heap size and frequency of garbage collection. In smaller heaps, when garbage must be collected very frequently, the pause times have more of an effect on application response times than they do in very large heaps, with infrequent collections.
gencon policy was not shown in this case
study because for this workload, it gave pause times greater than the
optavgpause pause times and throughput poorer
optthruput throughput. Therefore, for
gencon would not be a good
choice no matter what the performance criteria were. However, for other
gencon policy can give a very
good combination of low pause times and high throughput. For some very
gencon gives both
better pause times than the
and better throughput than the
Case study 2: Getting it all -- sometimes -- with gencon
Our second case study illustrates a more complex application, with more complex performance criterion. The application includes an application server and a multipart J2EE application sitting on top of that server. It is driven from a different machine. The important performance requirement for the application is throughput, but there's also a minimum response time criteria. Only transactions that complete within a given time are included in the throughput count, so the performance metric is effectively a combined throughput and mean response time metric.
Figure 5 shows a plot of the pause times for this second application. The
green line is the
gencon policy, and the blue
line is the
Figure 5. Pause times
When the application has been running for a few minutes, some of the long-lived objects will have survived enough collections to be promoted into the tenured area. This means that they no longer need to be flipped on every nursery collection, and the amount of data that survives each nursery collection is lower. Because there are fewer objects to flip, the pause time goes down.
The spike in the
gencon pause times happens when
the tenured area runs out of room, and a non-concurrent collection of the
whole heap is required. This collection is slower than a normal
collection, but such a collection happens very infrequently. The
application run shown was 10 or 15 minutes long. The tenured heap does not
need collection often, and it is usually collected concurrently.
Figure 6 shows the performance scores for the two policies. The
gencon policy (green line) has a 4 percent
advantage over the
optthruput policy (blue
Figure 6. Throughput
Table 3 summarizes the pause time and throughput results for this
application. On both measures, the
policy outperforms the
optthruput policy. The
gencon policy does better for two reasons. The
first, obvious, one is that because the performance metric takes account
of both response time and throughput, the
gencon policy's combination of good throughput
and good response times is likely to be a winner. The second reason is
harder to quantify. The pattern of object creation and object death in
this application benefits from the
policy's absence of fragmentation and rapid collections of sparsely
populated nurseries. Other applications with similar performance criteria
but different patterns of object creation might perform best with the
Table 3. Response times and throughput for the second case study
|Pause time||630 ms||424 ms|
|Throughput (meeting response time criteria)||10,100 transactions/second||10,500 transactions/second|
Getting the most out of your application
The first step in optimizing the performance of your application is to decide on the performance characteristics that are important to you. You may choose to tune for throughput, or response times, or some combination of the two. Once you've defined what your goal is, you can start measuring your application performance, experimenting with different policies, and looking at the verbose GC logs for hints.
As the two case studies showed, every application is different, and you may need to try a few different things before you find the combination of heap sizes, optional parameters, and garbage collection policy that work best for your particular application and system.
We hope this article and the one previous to it have helped you better understand how garbage collection works in the IBM SDK. Future articles will look at other aspects of the IBM implementation of Java technology, including class sharing and debugging, monitoring, and profiling capabilities.
- "Garbage collection policies, Part 1," Mattias Persson (developerWorks, May 2006): The second part of this series introduced the different GC policies and discussed their general characteristics.
- Diagnostics Guide: Get more details on verbose GC and instructions on adjusting garbage collection parameters in the IBM implementation of Java technology. (In PDF format.)
- Java SDKs: Download the SDKs for AIX, Linux, and z/OS, among other IBM developer kits for Java technology, from this page.
- IBM Development Package for Eclipse: Develop, test, and run your Java applications with this ready-to-run Java development environment.
- WebSphere Everyplace Micro Environment: A production-ready, run-time environment, tested and certified to meet J2ME specifications.