 | Level: Intermediate Jack Shirazi (jack@JavaPerformanceTuning.com), Director, JavaPerformanceTuning.com Kirk Pepperdine (kirk@JavaPerformanceTuning.com), CTO, JavaPerformanceTuning.com
21 May 2004 Does your application generate out-of-memory errors on a regular basis? Do your users experience response times that are somewhat erratic? Does your application become unresponsive for fairly lengthy durations? Does your application performance appear sluggish? If you answered yes to any of these questions, then you could very well have a problem with garbage collection. Stay tuned while Jack Shirazi and Kirk Pepperdine of JavaPerformanceTuning.com explain how to identify garbage collection problems, and in doing so, help you to answer the question: Do you know what you garbage collector is up to?
For many developers, memory management is at best considered a
distraction from the main task of developing business logic -- until
the business logic fails to perform as expected or as it had in during
testing. When this happens, you need to know what went wrong and why,
which means understanding how your application interacts with the
underlying computing resources and, in particular, memory. The best
way to understand how an application is utilizing memory is to observe
the garbage collector in action.
Why is my application stuttering?
The biggest single performance headache in the Java virtual machine
is the mutual exclusivity of running GC concurrently with application
threads. For the garbage collector to do its job, it needs to prevent
all other threads from accessing the heap space (memory) that it is
working on for some period of time. The period of time when this
occurs is known as the "stop-the-world" phase of GC, and, as the name
implies, your application will come to a screaming halt while the
garbage collector toils away. Fortunately, this pause is often so
short to as to not be noticeable, but you could easily imagine how
having an application pause at seemingly random points in time for
seemingly random and lengthy durations could have a devastating effect
on application responsiveness and throughput.
But GC is only one reason why your application may sputter and
stall. So how can you determine if GC responsible for causing these
problems? To answer that question, we need to measure how hard the
garbage collector is working, and continue to take these measurements
as we make changes in the system so that we might quantitatively
determine if our changes are having the desired effect.
How much memory do I need?
It is a generally accepted belief that adding more memory to a
system will solve many performance problems. While this principle
often holds true with JVMs, too much of a good thing can be harmful to
performance. So the trick is to give a Java application as much memory
as it needs and no more. The question is, how much does your
application need? As is the case with application stutters, we need to
observe garbage collection behavior to see if it is working harder
than it needs to. These observations will tell us if the changes we
are making are having the desired effect.
Measuring GC activity
The standard way to produce GC logs is to use the
-verbose:gc flag. Once set, the garbage collector
produces a summary of what it has done every time it runs, which is
generally written to the console (either via standard out or standard
error). Many VMs support an option that allows verbose GC output to be
directed to a file. For example, with Sun's 1.4 JVM, you can use the
switch -Xloggc:filename to write the GC output to a file. For the HP
JVM, you would use the -Xverbosegc=file switch. In this
article, we will be examining verbose GC output captured from the Sun
1.4.2 and IBM 1.4.1 JVMs.
One of the biggest advantages of using
this method to monitor memory utilization is that it places very
little drag on the performance of the application. Unfortunately, this
solution is not perfect in that these log files can get extremely
large and maintaining them may require you to restart the JVM. Having
said this, the technique is still viable in a production environment
as it can help diagnose performance problems that only manifest
themselves in that environment.
A deeper view of GC
The output produced by the -verbose:gc flag varies
depends upon the JVM vendor, and different garbage collector options
will report on information specific to its implementation. For
example, the output produced by IBM JVMs is far more verbose than that
produced by Sun JVMs, whereas Sun's output is better suited to be read
by tools. That said, each GC log communicates approximately the basic
information -- how much memory had been consumed, how much memory was
recovered, how long the GC cycle took, and were any there other
actions taken during collection. From these basic measurements, we can
fetter out details that will help us to better understand what is
going on. The statistics that we will calculate are shown below:
- Duration of the runtime under consideration
- Total number of collections
- Frequency of collections
- Longest time spent in collection
- Total time spent in collection
- Average time spent in collection
- Average interval between collections
- Total number of bytes allocated
- Average number of bytes allocated per collection, per second
- Total number of bytes recovered
- Average number of bytes recovered per collection, per second
By understanding pause times, we can begin to understand if GC is
partially or wholly responsible for an application becoming
unresponsive. One way to achieve this is to correlate the GC activity
in the verbose GC logs against other logs (such as the request backlog
in Web server logs) collected by the system. It is almost certain that
the longest GC pause will result in a visible drop in overall system
responsiveness, so it is useful to know when responsiveness drops so
we can correlate GC activity to application throughput.
Another possible point of contention is the rate at which heap
memory is being allocated and recovered, known as
churn. Applications that produce a large number of objects
that are released almost immediately typically suffer from churn. A
high rate of churn places a lot of stress on the garbage collector,
creating more contention for memory resources, which in turn can lead
to longer pauses or the dreaded OutOfMemoryError.
The way to understand if your application is experiencing either of
these issues is to measure the total amount of time taken by GC over
the total runtime under consideration. With this calculation we can to
begin to understand if GC is working much harder than it should
be. Let's derive the formulas needed to make these determinations.
The Sun GC log record
Listing 1 is an example of the log records produced by Sun 1.4.2_03
JVM running with the default mark-and-sweep collector with the
-Xloggc:filename. As you can see, the entry is a very
concise record of what each pass did.
Listing 1. GC log records using the -Xloggc:filename flag
69.713: [GC 11536K->11044K(12016K), 0.0032621 secs]
69.717: [Full GC 11044K->5143K(12016K), 0.1429698 secs]
69.865: [GC 5958K->5338K(11628K), 0.0021492 secs]
69.872: [GC 6169K->5418K(11628K), 0.0021718 secs]
69.878: [GC 6248K->5588K(11628K), 0.0029761 secs]
69.886: [GC 6404K->5657K(11628K), 0.0017877 secs]
|
The first thing you might notice is that every log record is printed
in an enclosing set of square brackets. Other GC algorithms, such as
the parallel collector, may break down some of the values into finer
bits of information. When that occurs, the value being refined is
replaced with the details enclosed in a nested set of square brackets,
which makes it easier for tools to process verbose GC output.
We start our investigation by looking at the record tagged 69.713
found in Listing 1. The tag is a timestamp in seconds and milliseconds
since the JVM was started. In this instance, the JVM had been running
for 69.713 seconds before this GC cycle started. The fields from left
to right are: the type of collection that was performed, heap
utilization before GC, heap utilization after GC, total heap capacity,
and the duration of the GC event. From this description we can see
that the first GC event was a minor collection. Before GC started,
11536 Kb of heap space was in use. Upon completion, there were
11044 Kb in use, the capacity of heap was 12016 Kb and the entire
collection took .0032621 seconds. The next event, a full GC, occurred
at 69.717 seconds or 0.003 seconds after the start of the preceding
minor GC event. Note that if you add the duration of the
minor GC event to its start time, you'll see that it finishes less
than 1ms before the start of the full GC. Thus we can conclude that the
minor collection failed to recover enough space, and that this failure
triggered the full GC. To the application, this would appear as a
single event that lasted 0.1462319 seconds. Let's move on to define
how we can calculate the other values.
Parameterization of a GC log record
We start the analysis by parameterizing the values found in each GC log record:
R(n) = T(n): [ <GC> HB->HE(HC), D]
n | Index of record in the list, 1 being the first and m being the last | R(n) | GC record | T(n) | Time at which the nth GC occurred | HB | Amount of heap used before GC | HE | amount of heap used after GC | HC | total amount of heap space | D | duration of the GC cycle |
With these definitions we can now derive formulas that we can use
to calculate the values described earlier.
Fundamental values
The first value that we will calculate is the overall duration of
the runtime covered in the logs. If we are to consider every record,
we would just look at the timestamp for the last record. Because Listing
1 represents only a portion of the total log record, we need to
subtract the first timestamp from the last. Though this number is
precise enough for this example, to be absolute one needs to add in
the duration of the last GC. The reason for this is the timestamp is
taken at the start of GC and the record represents things that
happened after the timestamp was recorded.
The remaining values are calculated by taking sums of the
appropriate values in the records. It is interesting to note that
bytes recovered can be calculated by examining the relationships of
the measures within a record while the bytes allocated can be
calculated by examining the relationship of the measures between
successive records. For example, if we consider the record pair found
at timestamp 69.872 and 69.878, we can calculate the number of bytes
allocated in the young generation by subtracting the amount of memory
consumed after GC in the first record by the number of bytes consumed
before GC in the second record: 6248 Kb - 5418 Kb = 830 Kb. The
formulations for other values are shown in Table 1 below.
To find the longest GC pause we need only inspect the duration timings and look for the maximal value of D(n) (duration at record n).
Table 1. Statistic formulas
| Statistic | Calculation (time units adjusted to seconds) | | Runtime duration | RT = (T(M) + D(M)) - T(1) | | Total number of minor collections | TMC = Sum( R(n)) where GC(n) = GC | | Total number of full collections | TFC = Sum( R(n)) where GC(n) = Full | | Collection frequency (minor) | CFM = TMC / RT | | Collection frequency (full) | CFF = TFC / RT | | Time in collection (longest minor) | MAX(D(n)) for all n where GC(n) = GC | | Time in collection (longest full) | MAX(D(n)) for all n where GC(n) = Full | | Time in minor collection (total) | TTMC = Sum(D(n)) for all n where GC(n) = GC | | Time in full collection (total) | TTFC Sum(D(n)) for all n where GC(n) = Full | | Time in collection (total) | TTC = TTMC + TTFC | | Time in minor collection (average) | ATMC = TTMC / RT | | Time in full collection (average) | ATFC = TTFC / RT | | Time in collection (average) | ATC = TTC / RT | | Interval between collections (average) | Sum( T(n+1) - T(n)) / (TMC + TFC) for all n | | Bytes allocated (total) | TBA = Sum(HB(n+1) - HE(n)) for all n | | Bytes allocated (per second) | TBA / RT | | Bytes allocated (per collection) | TBA / (TMC + TFC) | | Bytes recovered minor collection (total) | BRM = Sum(HB(n) - HE(n)) where GC(n) = GC | | Bytes recovered full collection (total) | BRF = Sum(HB(n) - HE(n)) where GC(n) = Full | | Bytes recovered (total) | BRT = BRM + BRF | | Bytes recovered (per minor collection) | BRPM = BRM / TMC | | Bytes recovered (per full collection) | BRPF = BRF / TMF | | Bytes recovered (minor per second) | BRP = BRM / TTMC | | Bytes recovered (full per second) | BRF = BRF / TTFC |
As you can see from the formulae, we often need to consider full
and minor GC separately. Minor GCs are fundamentally different from
full GCs, and are generally at least one order of magnitude faster
than the latter. We can see this from a quick inspection of Listing 1
-- the longest minor is 50 times faster than the full.
Table 2 below shows the application of the formulas in Table 1 against the values in Listing 1.
Table 2. Investigation of Sun GC logs
| Statistic | Calculation (time units adjusted to seconds) | | Runtime duration | (69.886 + 0.0017877) - 69.713 = 0.1747877 | | Total number of minor collections | 5 | | Total number of full collections | 1 | | Collection frequency (minor) | 5 / 0.1747877 = 28.6 per second | | Collection frequency (full) | 1 / 0.1747877 = 5.27 per second | | Time on collection (longest minor) | 0.0032621 | | Time in collection (longest full) | 0.1429698 | | Time in minor collection (total) | 0.0123469 | | Time in full collection (total) | 0.1429698 | | Time in collection (total) | 0.1553167 | | Time in minor collection (average) | 7.1% | | Time in full collection (average) | 81.8% | | Time in collection (average) | 88.9% | | Interval between collections (average) | .173/5=0.0346 | | Bytes allocated (total) | 3292 | | Bytes allocated (per second) | 18834 Kb/second | | Bytes allocated (per collection) | 549 Kb | | Bytes recovered minor collection (total) | 3270 Kb | | Bytes recovered full collection (total) | 5901 Kb | | Bytes recovered (total) | 5901 + 3270 = 9171 Kb | | Bytes recovered (per minor collection) | 3270/5 = 654 | | Bytes recovered (per full collection) | 5901/1 = 5901 | | Bytes recovered (minor per second) | 3270/0.0123469 = 264843 Kb/second | | Bytes recovered (full per second) | 5901/0.1429698 = 41274K/second |
Table 2 contains a considerable amount of information that was
derived from these seemingly simple logs. Depending upon the problems
that you are seeing, you may not need to calculate all of these
values, as some will be more useful than the others. In the case where
you are experiencing long periods of time when your application
becomes unresponsive, then you will want to focus on GC duration and
count.
The IBM GC log record
Unlike the Sun logs, the IBM logs are extremely verbose. Even so,
you still need a guide to fully understand the information that is
being offered. Listing 2 is an extract from one such
verbose:gc log file.
Listing 2. IBM JVM verbose:gc output
<AF[31]: Allocation Failure. need 528 bytes, 969 ms since last AF>
<AF[31]: managing allocation failure, action=1 (0/97133320) (1082224/1824504)>
<GC(31): GC cycle started Wed Feb 25 23:08:41 2004
<GC(31): freed 36259000 bytes, 37% free (37341224/98957824), in 569 ms>
<GC(31): mark: 532 ms, sweep: 37 ms, compact: 0 ms>
<GC(31): refs: soft 0 (age >= 32), weak 0, final 2, phantom 0>
<AF[31]: managing allocation failure, action=3 (37341224/98957824)>
<GC(31): need to expand mark bits for 116324864-byte heap>
<GC(31): expanded mark bits by 270336 to 1818624 bytes>
<GC(31): need to expand alloc bits for 116324864-byte heap>
<GC(31): expanded alloc bits by 270336 to 1818624 bytes>
<GC(31): need to expand FR bits for 116324864-byte heap>
<GC(31): expanded FR bits by 544768 to 3637248 bytes>
<GC(31): expanded heap by 17367040 to 116324864 bytes, 47% free, ratio:0.417>
<AF[31]: completed in 812 ms>
<AF[32]: Allocation Failure. need 528 bytes, 1765 ms since last AF>
<AF[32]: managing allocation failure, action=1 (0/115394264) (930600/930600)>
<GC(32): GC cycle started Wed Feb 25 23:08:43 2004
<GC(32): freed 54489184 bytes, 47% free (55419784/116324864), in 326 ms>
<GC(32): mark: 292 ms, sweep: 34 ms, compact: 0 ms>
<GC(32): refs: soft 0 (age >= 32), weak 0, final 0, phantom 0>
<AF[32]: completed in 328 ms>
<AF[33]: Allocation Failure. need 528 bytes, 1686 ms since last AF>
<AF[33]: managing allocation failure, action=1 (0/115510592) (814272/814272)>
<GC(33): GC cycle started Wed Feb 25 23:08:45 2004
<GC(33): freed 56382392 bytes, 49% free (57196664/116324864), in 323 ms>
<GC(33): mark: 285 ms, sweep: 38 ms, compact: 0 ms>
<GC(33): refs: soft 0 (age >= 32), weak 0, final 18, phantom 0>
<AF[33]: completed in 324 ms>
|
There are three GC log records in Listing 2. I will resist the
urge to offer a full explanation and instead offer the excellent
article "Sensible Sanitation" written by Sam Borman (see Resources). For
our purposes, we need to derive the same type of information as we did
with the logs from the Sun JVM. What is nice is that we have a number
of these calculations already prepared for us. For example, if we look
at AF[31] (allocation failure for event 31), we will see the intervals
between GC, the amount of memory recovered, and the duration of the
event. From these numbers we can calculate the other needed
values.
What the numbers mean
How you look at these numbers really depends upon the outcome you
are trying to achieve. In many server applications, it comes down to
reducing pause times, which entails reducing the duration and number
of full collections that occur. Next month, we'll explore how this
information was used to tune a real application that was suffering
from exactly this problem.
Resources
About the authors
 | |  | Kirk Pepperdine is the Chief Technical Officer at Java Performance Tuning.com and has been focused on Object technologies and performance tuning for the last 15 years. Kirk is a co-author of the book Ant Developer's Handbook (MacMillan). |
Rate this page
|  |