Java technology, IBM style: Garbage collection policies, Part 2

Use verbose GC and application metrics to optimize garbage collection

The previous installment in this series introduced the different garbage collection (GC) policies available in the IBM® implementation of version 5.0 of the Java™ runtime and discussed their general characteristics. In this article, series contributor Mattias Persson is joined by Holly Cummins to present a quantitative approach to choosing a policy with some examples. They describe what you should consider in making a choice, how to get guidance on a choice from the verbose GC logs, and present two case studies.

Share:

Mattias Persson, Staff Software Engineer, IBM

Mattias PerssonMattias Persson works in IBM Software Group Development in the United Kingdom, specializing in Java platform performance and scalability. He has been with IBM for four years and holds a MSc in computer science from Vaxjo University in Sweden. He is a J2EE-Certified Architect and a Principal Certified Lotus Professional. In his spare time, he can often be seen on his mountain bike, going up and down the hills north of the IBM Hursley site.



Dr. Holly Cummins, Software Engineer, IBM

Dr. Holly CumminsHolly Cummins is a developer in the Java Technology Centre in IBM United Kingdom. She has been with IBM for four years and holds a DPhil in quantum computation. She has a passion for mathematical models, obscure typesetting languages, and impractical footwear; a growing distaste for Brussels sprouts and weakly typed scripting languages with ambiguous syntax; and very poor houseplant-keeping skills.



16 May 2006

Also available in Chinese

Part 1 of this introduction to GC policies detailed the different policies available in the latest version of the IBM Java runtime environment (JRE):

About the series

The Java technology, IBM style series takes a look at the latest releases of the IBM implementations of the Java platform. You'll learn how IBM has implemented some of the advances built into version 5.0 of the Java platform, and find out how to use some of the value-added features built into IBM's new releases.

Please contact the authors individually with comments or questions about their articles. To comment on the series as a whole, you may contact series lead Chris Bailey. For more on the concepts discussed here and links where you can download the latest IBM releases, see Resources.

  • optthruput: Optimizes for throughput. This is the default policy.
  • optavgpause: Optimizes for the average GC pause.
  • gencon: Uses a generational concurrent style of collection.
  • subpool: Speeds up object allocation on systems with very large numbers of processors. This policy is only available on IBM pSeries® and zSeries® platforms, and we won't be discussing it further in this article.

In this article, we describe how to collect the information needed to choose a particular policy. Most of the time, the default policy works very well, and no change is necessary. However, the different policies have different characteristics, and for some applications, another policy works better. Before evaluating the different GC policies, though, you must decide on the performance characteristics that are important for a particular application.

Garbage collection and performance

When choosing a GC policy, it is important for you to consider that garbage collection can affect performance in several ways, both positive and negative. All GC policies, even concurrent ones, involve stopping application threads for some period of time. These pauses can cause a momentary lack of application responsiveness and reduce the time available to the application to do work. However, the right GC policy can also deliver significant performance advantages to an application.

Locality

Locality is the effect of physical memory location on the speed of object access. Objects that are stored spatially close to recently accessed objects can be accessed very quickly because of the way processors load data from memory.

Garbage collection can enhance application performance by improving object locality and the speed at which new objects can be allocated. Garbage collectors that rearrange objects in the heap by compaction or by performing a copying collection can move objects that access one another close to each other in the heap, and this can have a dramatic effect on the rate of data processing. Garbage collectors that group objects that are accessed at the same time together have a positive effect on application performance. Compacting and copying collectors can also increase the rate at which new objects can be allocated by keeping memory unfragmented. This eliminates the need to search through the available memory looking for a slot big enough to hold new allocations.

One obvious way to optimize application performance is to choose a GC policy that minimizes any negative impact of garbage collection pauses. A second, less obvious, way is to choose a policy that maximizes the benefits of garbage collection. For example, if many short-lived objects are being created, choosing the gencon policy to take advantage of its speedy allocation may give you the best performance.


Think about throughput, response times, and pause times

Before you can get the best performance out of an application, you need to think about what kind of performance characteristics you want. The performance of applications can be described in terms of two distinct properties: throughput and response time. Throughput represents the amount of data being processed by the system, and response time is the time taken by the application to process requests, from the receipt of the request to the completion of processing. For example, a Web application might be able to handle 1,000 requests per second (throughput) and turn each request around in 2 seconds (response time).

Throughput

Throughput is the amount of data processed by an application. Throughput must be measured with an application-specific metric.

For some applications, only one of these performance characteristics is important. An interactive application like a text editor generally doesn't need to process large volumes of data, but it is important that it be responsive to user keystrokes. An overnight batch processing application, on the other hand, need not have quick response times, but its ability to process large volumes of data efficiently is critical. However, for most applications, the ideal performance is achieved when response times are low and throughput is high.

An application responds to some requests quickly and other requests more slowly, depending on what else is happening when the request comes in. For example, if garbage collection is underway or if there are lots of requests in the incoming queue, requests may take longer to be processed. It is important to think about both the mean response time and the maximum response time when deciding what performance characteristics are important to a given application. For example, if an interactive application usually responds to user requests in a few milliseconds (low mean response time) but occasionally pauses for ten seconds (high maximum response time), users are unlikely to be satisfied with the performance of the application.

Response time

Response time is the latency of the application -- that is, how quickly it answers incoming requests. Both average and maximum numbers can be of interest.

It is important to note that GC pause times are not the same thing as application response times. Short pause times do not guarantee speedy application response times, and long pauses are also not necessarily going to cause high application response times. In very lightly loaded systems, pause times are an important component of response times, but the pause times become less and less important as the system becomes heavily loaded. Pause times are also more important when the heap is rarely collected than when it is frequently collected.

It is not possible to work out application response times by inspecting the verbose GC logs. Mean response times are quite closely related to application throughput and not highly correlated to GC pause times. The maximum response time is always longer than the longest GC pause, unless the application load is exceptionally low. However, in most applications, GC pauses make only a small contribution to the maximum pause time, as you'll see in our first case study.

Pause time

Pause time is the duration of time when the garbage collector has paused all application threads to collect the heap. Pause time is not the same thing as response time.

Using the verbose GC logs

Verbose GC logs give insight into the operation of the garbage collector and can give hints about policy and parameter choices. The latest version of the IBM Developer Kit provides very detailed logs in an XML format. These logs show what the garbage collector has been doing, in great detail. The verbose GC log includes heap sizes, information about the division of the heap in the gencon policy, pause timings, object finalizations, and references cleared. You can enable verbose GC using one of two command-line options: either -verbose:gc or -Xverbosegclog:filename.

Why look at verbose GC logs?

There are several reasons why you might want to enable verbose GC and inspect the logs. Your application might be experiencing long pauses or periods of unresponsiveness, and you might want to identify or rule out garbage collection as the cause of the pauses. Alternatively, if you're tuning your application to maximize performance, there might be some clues in the verbose GC about changes that could improve performance.

An example of verbose GC

Listing 1 shows an example section of verbose GC for one collection in the optthruput policy:

Listing 1. Sample verbose GC for one collection
<af type="tenured" id="1" timestamp="Sun Mar 12 19:12:55 2006" intervalms="0.000">
  <minimum requested_bytes="16" />
  <time exclusiveaccessms="0.025" />
  <tenured freebytes="23592960" totalbytes="471859200" percent="5" >
    <soa freebytes="0" totalbytes="448266240" percent="0" />
    <loa freebytes="23592960" totalbytes="23592960" percent="100" />
  </tenured>
  <gc type="global" id="3" totalid="3" intervalms="11620.259">
    <refs_cleared soft="0" weak="72" phantom="0" />
    <finalization objectsqueued="9" />
    <timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" />
    <tenured freebytes="409273392" totalbytes="471859200" percent="86" >
      <soa freebytes="385680432" totalbytes="448266240" percent="86" />
      <loa freebytes="23592960" totalbytes="23592960" percent="100" />
    </tenured>
  </gc>
  <tenured freebytes="409272720" totalbytes="471859200" percent="86" >
    <soa freebytes="385679760" totalbytes="448266240" percent="86" />
    <loa freebytes="23592960" totalbytes="23592960" percent="100" />
  </tenured>
  <time totalms="83.227" />
</af>

The opening <af> element shows what triggered the collection in this case: an allocation failure. Other possibilities are <con> for a concurrent collection or <sys> for a collection forced by System.gc(). Concurrent collections only occur in the optavgpause or gencon policies. Forced garbage collections are not recommended, and so if <sys> elements are present in the verbose GC, consider rewriting the application to avoid interfering with the garbage collection routines or disabling explicit invocation of garbage collection with the -Xdisableexplicitgc command-line parameter.

Heap occupancy

The elements most likely to be of interest are the three copies of the <tenured> element describing the occupancy of the heap:

   <tenured freebytes="23592960" totalbytes="471859200" percent="5" >
    <soa freebytes="0" totalbytes="448266240" percent="0" />
    <loa freebytes="23592960" totalbytes="23592960" percent="100" />
  </tenured>

There are three copies of these elements, to show the state of heap at three important points in time. The first copy shows the state of the heap before collection. The second copy, nested within the <gc> element, represents the heap after collection. This shows the amount of live data in the application at the time of the collection. The final copy shows the amount of heap available after the request that triggered the allocation was satisfied. The difference between this and the amount of free space available immediately after the collection may be more than the actual amount requested. The memory manager allocates memory to threads in chunks to minimize contention on the heap lock.

If the occupancy is consistently high even after the heap has been collected, the maximum heap may be too small. It can be enlarged with the -Xmx command-line option.

The nested <loa> and <soa> elements describe the heap used by the large and small object areas. The large object area is a small area of the heap reserved for large object allocations, while the small object area is the "normal" heap. All objects are initially allocated to the small object area, but if the small area is full, objects larger than 64KB are allocated to the large object area. If the large object area is not required by the application (that is, if the application does not allocate any large objects), the memory management routine quickly shrinks the large object area down to nothing so that the whole heap is available for "normal" allocations.

The following line shows the size of the allocation request that triggered the allocation failure:

<minimum requested_bytes="16" />

If the amount of free space available in the small object area is much greater than the size of the allocation request and yet the request can not be satisfied, the heap is fragmented. Consider starting with a very small heap or using the gencon policy. Small initial heaps reduce fragmentation by encouraging the collector to perform compactions early, when the heap is small and the cost of compaction is lower. The gencon policy avoids fragmentation by using a copying collector that efficiently compacts the nursery area as a side-effect of every collection.

Logs collected using the gencon policy also have a line for the nursery. The nursery is the area of the heap that holds all recently allocated objects.

<nursery freebytes="0" totalbytes="33554432" percent="0" />

Occupancy

Occupancy describes how much of the heap is occupied by live memory at any given time. Applications with stable occupancies over long periods of time are good candidates for fixed heap sizes.

If the memory requirements of your application are relatively stable, consider setting the minimum and maximum heap sizes to be equal to one another. This can improve application performance by avoiding the garbage collector unnecessarily shrinking the heap. Use the heap occupancies in the log to determine how big to make the heap. The fixed heap must be at least as big as the maximum memory usage of the application to avoid out-of-memory errors. If the heap is close to the memory usage of the application, the garbage collector has to collect the heap too often, and performance suffers.

There is no magic figure for the ideal heap occupancy, but a good initial rule of thumb is to aim for a heap about one and a half times as large as the maximum memory requirements of the system. In general, the bigger the heap, the better the application performance, although there are exceptions with certain systems or certain kinds of workloads. Shrinking the heap can also be disabled by adding the command-line option -Xmaxf=1. As discussed above, starting with a very small heap and then allowing the garbage collector to expand can reduce fragmentation and improve performance, so a certain amount of trial and error is needed to decide whether the default setting, fixed heaps, disabled shrinking, or small initial heaps are better for your application.

The gencon policy works best when its main assumption -- that the majority of recently allocated objects do not survive many garbage collections -- holds. If the nursery is very full even after collection, too many objects are probably surviving collections. Consider tuning the nursery size up or down so that collections happen when there are fewer live objects in the nursery, or consider changing to another GC policy.

References and finalizers

Weak, soft, and phantom references can improve the memory characteristics of your application by allowing flexible caching. However, like many things, it's best to use them in moderation. If the collector has to clear thousands of references every collection, pause times suffer. Consider whether your application really needs such a large number of references.

<refs_cleared soft="4" weak="28" phantom="0"  />

Unlike references, which are best used in moderation, finalizers have essentially no use in a well-written application, with the exception of tidying up native resources. If you see objects queued for finalization in the verbose GC logs, try to rewrite the application to eliminate the use of finalizers.

<finalization objectsqueued="5" />

Pause times

Also likely to be of interest in the GC log is the time information. For global collections, the breakdown of time taken for each of the mark, sweep, and compact phases is given as a nested element in the <gc> element:

<timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" />

Compactions are rare events, and so the compact time is almost always zero. If many compactions are occurring, consider expanding the maximum heap size so that there is more headroom in the heap, shrinking the minimum heap size so that the cost of early compactions is reduced, or switching to the gencon policy. The sweep time should be significantly shorter than the mark times. If the sweep times are very long, consider reducing the size of the heap or switching to the gencon policy.

The very last element of each collection element is a <time> element, which records how much time was taken by the garbage collection:

<time totalms="83.227" >

However, this bears less relation to application performance, and even to application response times, than you might initially think. Application load, garbage collection concurrency, locality, and allocation efficiencies can all influence application performance. The first case study illustrates how pause times, throughput, and response times are related in a typical application.


Case study 1: Know what you want

In this case study, we examine how the choice of garbage collection policy affects application performance with a fairly typical three-tiered client-server application.

The number of active threads pushing work into the application is increased every two minutes. Each thread has associated data; so, as the number of threads increases, the memory consumption of the application also increases. The application was run on a computer with four processors, and the number of active threads was varied from one to eight. Once there are more threads than processors, the threads must compete with one another for processor time, and some threads will always be waiting, so the application effectiveness is reduced. This is reflected in lowered throughput, higher mean response times, and much higher maximum response times. Figures 1, 2, 3, and 4 show graphs of the pause times, throughput, and response times when the application is run with the optthruput and optavgpause policies. The halfway point of the graphs is the point at which the system becomes overloaded.

Pause times

Figure 1 shows the application pause times for each garbage collection, using the times given in the <timesms mark="74.734" sweep="7.611" compact="0.000" total="82.420" /> element of the verbose GC log. As expected, the optthruput pauses (blue line) are much longer than the optavgpause pauses (green line). The optthruput pauses get longer as the amount of live data in the heap increases, while the optavgpause pauses remain more constant.

Figure 1. The pause times
The pause times

Throughput

Figure 2 shows the measured throughput for the application. Again, as expected, there is a trade-off between optimizing for pauses and optimizing for throughput: The throughput is slightly higher in the optthruput policy (blue line) than it is in the optavgpause policy (green line), no matter how heavy the application load. Therefore, if throughput is your most important consideration, you should use the optthruput policy instead of the optavgpause policy. The optthruput policy has better throughput than the optavgpause policy for any application, for the reasons described in Part 2 of this series (see Resources for a link).

Figure 2. Application throughput
Application throughput

Response times

Mean response times follow the same trend as the throughput in this case, with the mean response times always being better with the optthruput policy, as shown in Figure 3. The blue line is the optthruput policy, and the green line is the optavgpause policy. Therefore, if mean response times are the most important concern, the optthruput policy should be used instead of the optavgpause policy, even though mean pauses are smaller with the optavgpause policy.

Figure 3. Mean response times
Mean response times

Finally, Figure 4 shows the maximum response times for both policies. The blue line is the optthruput policy, and the green line is the optavgpause policy. When the system is underloaded, the maximum response times are better with the optavgpause policy. However, once the system becomes overloaded, the optthruput policy gives quicker response times. Notice that the maximum response times are close to the pause times when the system is underloaded. The garbage collection pause times are the most significant delays any threads encounter. In this environment, the optavgpause policy gives the best maximum response times. When the system is overloaded, on the other hand, threads need to wait for other threads as well as for the garbage collector, and the maximum response times increase sharply for both policies. Because the throughput is higher with the optthruput policy, threads see any threads ahead of them in the system get out of the way and release control more quickly, and so the maximum response time is better.

If the system is very heavily loaded or if the heap is smaller so that collections happen more frequently, the optthruput policy may not give better maximum response times. If garbage collections are running very frequently, a particularly unfortunate thread that gets stuck behind another thread in a queue may also have to wait for several garbage collections before it gets to the head of the queue. This would give a very long maximum pause in the optthruput policy and a shorter maximum pause in the optavgpause policy.

Figure 4. Maximum response times
Maximum response times

Summary

Tables 1 and 2 summarize the pause times, response times, and throughput for the two policies, for the lightly loaded and overloaded cases:

Table 1. Lightly loaded
optavgpauseoptthruput
Pause time12 ms101 ms
Maximum response time35 ms100 ms
Mean response time0.069 ms0.063 ms
Throughput22,300 transactions/second24,600 transactions/second
Table 2. Heavily loaded
optavgpauseoptthruput
Pause time14 ms155 ms
Maximum response time724 ms593 ms
Mean response time0.13 ms0.11 ms
Throughput28,000 transactions/second31,900 transactions/second

As the tables show, if throughput is your most important concern, you should use the optthruput policy. If response time is more important than throughput, the choice is less obvious. When the application is lightly loaded, the best response times are achieved when garbage collection pauses are minimized with the optavgpause policy. However, if the number of threads is greater than the number of processors, the application has been overloaded, and the optthruput policy gives the best maximum response times in such a case. (Other applications will have different criteria for when they become overloaded.)

In any system where work is queuing up, the response time of the system is likely to be dominated by the queueing time, rather than any garbage collection pauses. Many other factors, including I/O pauses, database wait times, Web service response times, network delays, and any other external interaction may also contribute to the response time. For this reason, pause times are not necessarily a good indicator of the expected response times. The degree to which the pause times will affect response times is also determined by the heap size and frequency of garbage collection. In smaller heaps, when garbage must be collected very frequently, the pause times have more of an effect on application response times than they do in very large heaps, with infrequent collections.

The gencon policy was not shown in this case study because for this workload, it gave pause times greater than the optavgpause pause times and throughput poorer than the optthruput throughput. Therefore, for this workload, gencon would not be a good choice no matter what the performance criteria were. However, for other workloads, the gencon policy can give a very good combination of low pause times and high throughput. For some very transactional workloads, gencon gives both better pause times than the optavgpause policy and better throughput than the optthruput policy.


Case study 2: Getting it all -- sometimes -- with gencon

Our second case study illustrates a more complex application, with more complex performance criterion. The application includes an application server and a multipart J2EE application sitting on top of that server. It is driven from a different machine. The important performance requirement for the application is throughput, but there's also a minimum response time criteria. Only transactions that complete within a given time are included in the throughput count, so the performance metric is effectively a combined throughput and mean response time metric.

Pause times

Figure 5 shows a plot of the pause times for this second application. The green line is the gencon policy, and the blue line is the optthruput policy:

Figure 5. Pause times
Pause times

When the application has been running for a few minutes, some of the long-lived objects will have survived enough collections to be promoted into the tenured area. This means that they no longer need to be flipped on every nursery collection, and the amount of data that survives each nursery collection is lower. Because there are fewer objects to flip, the pause time goes down.

The spike in the gencon pause times happens when the tenured area runs out of room, and a non-concurrent collection of the whole heap is required. This collection is slower than a normal collection, but such a collection happens very infrequently. The application run shown was 10 or 15 minutes long. The tenured heap does not need collection often, and it is usually collected concurrently.

Throughput

Figure 6 shows the performance scores for the two policies. The gencon policy (green line) has a 4 percent advantage over the optthruput policy (blue line).

Figure 6. Throughput
Throughput

Table 3 summarizes the pause time and throughput results for this application. On both measures, the gencon policy outperforms the optthruput policy. The gencon policy does better for two reasons. The first, obvious, one is that because the performance metric takes account of both response time and throughput, the gencon policy's combination of good throughput and good response times is likely to be a winner. The second reason is harder to quantify. The pattern of object creation and object death in this application benefits from the gencon policy's absence of fragmentation and rapid collections of sparsely populated nurseries. Other applications with similar performance criteria but different patterns of object creation might perform best with the optthruput policy.

Table 3. Response times and throughput for the second case study
optthruputgencon
Pause time630 ms424 ms
Throughput (meeting response time criteria)10,100 transactions/second10,500 transactions/second

Getting the most out of your application

The first step in optimizing the performance of your application is to decide on the performance characteristics that are important to you. You may choose to tune for throughput, or response times, or some combination of the two. Once you've defined what your goal is, you can start measuring your application performance, experimenting with different policies, and looking at the verbose GC logs for hints.

As the two case studies showed, every application is different, and you may need to try a few different things before you find the combination of heap sizes, optional parameters, and garbage collection policy that work best for your particular application and system.

We hope this article and the one previous to it have helped you better understand how garbage collection works in the IBM SDK. Future articles will look at other aspects of the IBM implementation of Java technology, including class sharing and debugging, monitoring, and profiling capabilities.

Resources

Learn

  • "Garbage collection policies, Part 1," Mattias Persson (developerWorks, May 2006): The second part of this series introduced the different GC policies and discussed their general characteristics.
  • Diagnostics Guide: Get more details on verbose GC and instructions on adjusting garbage collection parameters in the IBM implementation of Java technology. (In PDF format.)
  • The developerWorks Java zone: Browse all Java content.

Get products and technologies

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=112083
ArticleTitle=Java technology, IBM style: Garbage collection policies, Part 2
publish-date=05162006