Skip to main content

skip to main content

developerWorks  >  Java technology  >

Eye on performance: Trash talk

Do you know what your Garbage Collector is up to?

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Jack Shirazi (jack@JavaPerformanceTuning.com), Director, JavaPerformanceTuning.com
Kirk Pepperdine (kirk@JavaPerformanceTuning.com), CTO, JavaPerformanceTuning.com

21 May 2004

Does your application generate out-of-memory errors on a regular basis? Do your users experience response times that are somewhat erratic? Does your application become unresponsive for fairly lengthy durations? Does your application performance appear sluggish? If you answered yes to any of these questions, then you could very well have a problem with garbage collection. Stay tuned while Jack Shirazi and Kirk Pepperdine of JavaPerformanceTuning.com explain how to identify garbage collection problems, and in doing so, help you to answer the question: Do you know what you garbage collector is up to?

For many developers, memory management is at best considered a distraction from the main task of developing business logic -- until the business logic fails to perform as expected or as it had in during testing. When this happens, you need to know what went wrong and why, which means understanding how your application interacts with the underlying computing resources and, in particular, memory. The best way to understand how an application is utilizing memory is to observe the garbage collector in action.

Why is my application stuttering?

The biggest single performance headache in the Java virtual machine is the mutual exclusivity of running GC concurrently with application threads. For the garbage collector to do its job, it needs to prevent all other threads from accessing the heap space (memory) that it is working on for some period of time. The period of time when this occurs is known as the "stop-the-world" phase of GC, and, as the name implies, your application will come to a screaming halt while the garbage collector toils away. Fortunately, this pause is often so short to as to not be noticeable, but you could easily imagine how having an application pause at seemingly random points in time for seemingly random and lengthy durations could have a devastating effect on application responsiveness and throughput.

But GC is only one reason why your application may sputter and stall. So how can you determine if GC responsible for causing these problems? To answer that question, we need to measure how hard the garbage collector is working, and continue to take these measurements as we make changes in the system so that we might quantitatively determine if our changes are having the desired effect.

How much memory do I need?

It is a generally accepted belief that adding more memory to a system will solve many performance problems. While this principle often holds true with JVMs, too much of a good thing can be harmful to performance. So the trick is to give a Java application as much memory as it needs and no more. The question is, how much does your application need? As is the case with application stutters, we need to observe garbage collection behavior to see if it is working harder than it needs to. These observations will tell us if the changes we are making are having the desired effect.



Back to top


Measuring GC activity

The standard way to produce GC logs is to use the -verbose:gc flag. Once set, the garbage collector produces a summary of what it has done every time it runs, which is generally written to the console (either via standard out or standard error). Many VMs support an option that allows verbose GC output to be directed to a file. For example, with Sun's 1.4 JVM, you can use the switch -Xloggc:filename to write the GC output to a file. For the HP JVM, you would use the -Xverbosegc=file switch. In this article, we will be examining verbose GC output captured from the Sun 1.4.2 and IBM 1.4.1 JVMs.

One of the biggest advantages of using this method to monitor memory utilization is that it places very little drag on the performance of the application. Unfortunately, this solution is not perfect in that these log files can get extremely large and maintaining them may require you to restart the JVM. Having said this, the technique is still viable in a production environment as it can help diagnose performance problems that only manifest themselves in that environment.

A deeper view of GC

The output produced by the -verbose:gc flag varies depends upon the JVM vendor, and different garbage collector options will report on information specific to its implementation. For example, the output produced by IBM JVMs is far more verbose than that produced by Sun JVMs, whereas Sun's output is better suited to be read by tools. That said, each GC log communicates approximately the basic information -- how much memory had been consumed, how much memory was recovered, how long the GC cycle took, and were any there other actions taken during collection. From these basic measurements, we can fetter out details that will help us to better understand what is going on. The statistics that we will calculate are shown below:

  • Duration of the runtime under consideration
  • Total number of collections
  • Frequency of collections
  • Longest time spent in collection
  • Total time spent in collection
  • Average time spent in collection
  • Average interval between collections
  • Total number of bytes allocated
  • Average number of bytes allocated per collection, per second
  • Total number of bytes recovered
  • Average number of bytes recovered per collection, per second

By understanding pause times, we can begin to understand if GC is partially or wholly responsible for an application becoming unresponsive. One way to achieve this is to correlate the GC activity in the verbose GC logs against other logs (such as the request backlog in Web server logs) collected by the system. It is almost certain that the longest GC pause will result in a visible drop in overall system responsiveness, so it is useful to know when responsiveness drops so we can correlate GC activity to application throughput.

Another possible point of contention is the rate at which heap memory is being allocated and recovered, known as churn. Applications that produce a large number of objects that are released almost immediately typically suffer from churn. A high rate of churn places a lot of stress on the garbage collector, creating more contention for memory resources, which in turn can lead to longer pauses or the dreaded OutOfMemoryError.

The way to understand if your application is experiencing either of these issues is to measure the total amount of time taken by GC over the total runtime under consideration. With this calculation we can to begin to understand if GC is working much harder than it should be. Let's derive the formulas needed to make these determinations.



Back to top


The Sun GC log record

Listing 1 is an example of the log records produced by Sun 1.4.2_03 JVM running with the default mark-and-sweep collector with the -Xloggc:filename. As you can see, the entry is a very concise record of what each pass did.


Listing 1. GC log records using the -Xloggc:filename flag
69.713: [GC 11536K->11044K(12016K), 0.0032621 secs]
69.717: [Full GC 11044K->5143K(12016K), 0.1429698 secs]
69.865: [GC 5958K->5338K(11628K), 0.0021492 secs]
69.872: [GC 6169K->5418K(11628K), 0.0021718 secs]
69.878: [GC 6248K->5588K(11628K), 0.0029761 secs]
69.886: [GC 6404K->5657K(11628K), 0.0017877 secs]

The first thing you might notice is that every log record is printed in an enclosing set of square brackets. Other GC algorithms, such as the parallel collector, may break down some of the values into finer bits of information. When that occurs, the value being refined is replaced with the details enclosed in a nested set of square brackets, which makes it easier for tools to process verbose GC output.

We start our investigation by looking at the record tagged 69.713 found in Listing 1. The tag is a timestamp in seconds and milliseconds since the JVM was started. In this instance, the JVM had been running for 69.713 seconds before this GC cycle started. The fields from left to right are: the type of collection that was performed, heap utilization before GC, heap utilization after GC, total heap capacity, and the duration of the GC event. From this description we can see that the first GC event was a minor collection. Before GC started, 11536 Kb of heap space was in use. Upon completion, there were 11044 Kb in use, the capacity of heap was 12016 Kb and the entire collection took .0032621 seconds. The next event, a full GC, occurred at 69.717 seconds or 0.003 seconds after the start of the preceding minor GC event. Note that if you add the duration of the minor GC event to its start time, you'll see that it finishes less than 1ms before the start of the full GC. Thus we can conclude that the minor collection failed to recover enough space, and that this failure triggered the full GC. To the application, this would appear as a single event that lasted 0.1462319 seconds. Let's move on to define how we can calculate the other values.

Parameterization of a GC log record

We start the analysis by parameterizing the values found in each GC log record:

R(n) = T(n): [ <GC> HB->HE(HC), D]

nIndex of record in the list, 1 being the first and m being the last
R(n)GC record
T(n)Time at which the nth GC occurred
HBAmount of heap used before GC
HEamount of heap used after GC
HCtotal amount of heap space
Dduration of the GC cycle

With these definitions we can now derive formulas that we can use to calculate the values described earlier.



Back to top


Fundamental values

The first value that we will calculate is the overall duration of the runtime covered in the logs. If we are to consider every record, we would just look at the timestamp for the last record. Because Listing 1 represents only a portion of the total log record, we need to subtract the first timestamp from the last. Though this number is precise enough for this example, to be absolute one needs to add in the duration of the last GC. The reason for this is the timestamp is taken at the start of GC and the record represents things that happened after the timestamp was recorded.

The remaining values are calculated by taking sums of the appropriate values in the records. It is interesting to note that bytes recovered can be calculated by examining the relationships of the measures within a record while the bytes allocated can be calculated by examining the relationship of the measures between successive records. For example, if we consider the record pair found at timestamp 69.872 and 69.878, we can calculate the number of bytes allocated in the young generation by subtracting the amount of memory consumed after GC in the first record by the number of bytes consumed before GC in the second record: 6248 Kb - 5418 Kb = 830 Kb. The formulations for other values are shown in Table 1 below.

To find the longest GC pause we need only inspect the duration timings and look for the maximal value of D(n) (duration at record n).

Table 1. Statistic formulas

StatisticCalculation (time units adjusted to seconds)
Runtime durationRT = (T(M) + D(M)) - T(1)
Total number of minor collectionsTMC = Sum( R(n)) where GC(n) = GC
Total number of full collectionsTFC = Sum( R(n)) where GC(n) = Full
Collection frequency (minor)CFM = TMC / RT
Collection frequency (full)CFF = TFC / RT
Time in collection (longest minor)MAX(D(n)) for all n where GC(n) = GC
Time in collection (longest full)MAX(D(n)) for all n where GC(n) = Full
Time in minor collection (total)TTMC = Sum(D(n)) for all n where GC(n) = GC
Time in full collection (total)TTFC Sum(D(n)) for all n where GC(n) = Full
Time in collection (total)TTC = TTMC + TTFC
Time in minor collection (average)ATMC = TTMC / RT
Time in full collection (average)ATFC = TTFC / RT
Time in collection (average)ATC = TTC / RT
Interval between collections (average)Sum( T(n+1) - T(n)) / (TMC + TFC) for all n
Bytes allocated (total)TBA = Sum(HB(n+1) - HE(n)) for all n
Bytes allocated (per second)TBA / RT
Bytes allocated (per collection)TBA / (TMC + TFC)
Bytes recovered minor collection (total)BRM = Sum(HB(n) - HE(n)) where GC(n) = GC
Bytes recovered full collection (total)BRF = Sum(HB(n) - HE(n)) where GC(n) = Full
Bytes recovered (total)BRT = BRM + BRF
Bytes recovered (per minor collection)BRPM = BRM / TMC
Bytes recovered (per full collection)BRPF = BRF / TMF
Bytes recovered (minor per second)BRP = BRM / TTMC
Bytes recovered (full per second)BRF = BRF / TTFC

As you can see from the formulae, we often need to consider full and minor GC separately. Minor GCs are fundamentally different from full GCs, and are generally at least one order of magnitude faster than the latter. We can see this from a quick inspection of Listing 1 -- the longest minor is 50 times faster than the full.

Table 2 below shows the application of the formulas in Table 1 against the values in Listing 1.

Table 2. Investigation of Sun GC logs

StatisticCalculation (time units adjusted to seconds)
Runtime duration(69.886 + 0.0017877) - 69.713 = 0.1747877
Total number of minor collections5
Total number of full collections1
Collection frequency (minor)5 / 0.1747877 = 28.6 per second
Collection frequency (full)1 / 0.1747877 = 5.27 per second
Time on collection (longest minor)0.0032621
Time in collection (longest full)0.1429698
Time in minor collection (total)0.0123469
Time in full collection (total)0.1429698
Time in collection (total)0.1553167
Time in minor collection (average)7.1%
Time in full collection (average)81.8%
Time in collection (average)88.9%
Interval between collections (average).173/5=0.0346
Bytes allocated (total)3292
Bytes allocated (per second)18834 Kb/second
Bytes allocated (per collection)549 Kb
Bytes recovered minor collection (total)3270 Kb
Bytes recovered full collection (total)5901 Kb
Bytes recovered (total)5901 + 3270 = 9171 Kb
Bytes recovered (per minor collection)3270/5 = 654
Bytes recovered (per full collection)5901/1 = 5901
Bytes recovered (minor per second)3270/0.0123469 = 264843 Kb/second
Bytes recovered (full per second)5901/0.1429698 = 41274K/second

Table 2 contains a considerable amount of information that was derived from these seemingly simple logs. Depending upon the problems that you are seeing, you may not need to calculate all of these values, as some will be more useful than the others. In the case where you are experiencing long periods of time when your application becomes unresponsive, then you will want to focus on GC duration and count.



Back to top


The IBM GC log record

Unlike the Sun logs, the IBM logs are extremely verbose. Even so, you still need a guide to fully understand the information that is being offered. Listing 2 is an extract from one such verbose:gc log file.


Listing 2. IBM JVM verbose:gc output
<AF[31]: Allocation Failure. need 528 bytes, 969 ms since last AF>
<AF[31]: managing allocation failure, action=1 (0/97133320) (1082224/1824504)>
  <GC(31): GC cycle started Wed Feb 25 23:08:41 2004
  <GC(31): freed 36259000 bytes, 37% free (37341224/98957824), in 569 ms>
  <GC(31): mark: 532 ms, sweep: 37 ms, compact: 0 ms>
  <GC(31): refs: soft 0 (age >= 32), weak 0, final 2, phantom 0>
<AF[31]: managing allocation failure, action=3 (37341224/98957824)>
  <GC(31): need to expand mark bits for 116324864-byte heap>
  <GC(31): expanded mark bits by 270336 to 1818624 bytes>
  <GC(31): need to expand alloc bits for 116324864-byte heap>
  <GC(31): expanded alloc bits by 270336 to 1818624 bytes>
  <GC(31): need to expand FR bits for 116324864-byte heap>
  <GC(31): expanded FR bits by 544768 to 3637248 bytes>
  <GC(31): expanded heap by 17367040 to 116324864 bytes, 47% free, ratio:0.417>
<AF[31]: completed in 812 ms>

<AF[32]: Allocation Failure. need 528 bytes, 1765 ms since last AF>
<AF[32]: managing allocation failure, action=1 (0/115394264) (930600/930600)>
  <GC(32): GC cycle started Wed Feb 25 23:08:43 2004
  <GC(32): freed 54489184 bytes, 47% free (55419784/116324864), in 326 ms>
  <GC(32): mark: 292 ms, sweep: 34 ms, compact: 0 ms>
  <GC(32): refs: soft 0 (age >= 32), weak 0, final 0, phantom 0>
<AF[32]: completed in 328 ms>

<AF[33]: Allocation Failure. need 528 bytes, 1686 ms since last AF>
<AF[33]: managing allocation failure, action=1 (0/115510592) (814272/814272)>
  <GC(33): GC cycle started Wed Feb 25 23:08:45 2004
  <GC(33): freed 56382392 bytes, 49% free (57196664/116324864), in 323 ms>
  <GC(33): mark: 285 ms, sweep: 38 ms, compact: 0 ms>
  <GC(33): refs: soft 0 (age >= 32), weak 0, final 18, phantom 0>
<AF[33]: completed in 324 ms>

There are three GC log records in Listing 2. I will resist the urge to offer a full explanation and instead offer the excellent article "Sensible Sanitation" written by Sam Borman (see Resources). For our purposes, we need to derive the same type of information as we did with the logs from the Sun JVM. What is nice is that we have a number of these calculations already prepared for us. For example, if we look at AF[31] (allocation failure for event 31), we will see the intervals between GC, the amount of memory recovered, and the duration of the event. From these numbers we can calculate the other needed values.



Back to top


What the numbers mean

How you look at these numbers really depends upon the outcome you are trying to achieve. In many server applications, it comes down to reducing pause times, which entails reducing the duration and number of full collections that occur. Next month, we'll explore how this information was used to tune a real application that was suffering from exactly this problem.



Resources



About the authors

Jack Shirazi is the Director of JavaPerformanceTuning.com and author of Java Performance Tuning, 2nd Edition (O'Reilly).


Kirk Pepperdine is the Chief Technical Officer at Java Performance Tuning.com and has been focused on Object technologies and performance tuning for the last 15 years. Kirk is a co-author of the book Ant Developer's Handbook (MacMillan).




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top