Garbage collection in WebSphere Application Server V8, Part 1
Generational as the new default policy
This content is part # of # in the series: Garbage collection in WebSphere Application Server V8, Part 1
This content is part of the series:Garbage collection in WebSphere Application Server V8, Part 1
Stay tuned for additional content in this series.
Garbage collection (GC) is an integral part of the Java Virtual Machine (JVM) as it collects unused Java heap memory so that the application can continue allocating new objects. The effectiveness and performance of the GC play an important role in application performance and determinism. The IBM JVM provided with IBM WebSphere Application Server V8 (on supported platforms) provides four different GC policy algorithms:
Each of these algorithms provides different performance and deterministic
qualities. In addition, the default policy in WebSphere Application Server
V8 has changed from
-Xgcpolicy:optthruput to the
-Xgcpolicy:gencon policy. Let’s take a look at each of these
policies and see what this change in the default policy means.
The garbage collector
Different applications naturally have different memory usage patterns. A computationally intensive number crunching workload will not use the Java heap in the same way as a highly transactional customer-facing interface. To optimally handle these different sorts of workloads, different garbage collection strategies are required. The IBM JVM supports several garbage collection policies to enable you to choose the strategy that best fits your application
The parallel mark-sweep-compact collector: optthruput
The simplest possible garbage collection technique is to continue allocating until free memory has been exhausted, then stop the application and process the entire heap. While this results in a very efficient garbage collector, it means that the user program must be able to tolerate the pauses introduced by the collector. Workloads that are only concerned about overall throughput might benefit from this strategy.
The optthruput policy
-Xgcpolicy:optthruput) implements this strategy (Figure 1).
This collector uses a parallel mark-sweep algorithm. In a
nutshell, this means that the collector first walks through the set of
reachable objects, marking them as live data. A second pass then sweeps
away the unmarked objects, leaving behind free memory than can be used for
new allocations. The majority of this work can be done in parallel, so the
collector uses additional threads (up to the number of CPUs by default) to
get the job done faster, reducing the time the application remains paused.
Figure 1. Application and collector CPU usage: optthruput
The problem with a mark-sweep algorithm is that it can lead to fragmentation (Figure 2). There might be lots of free memory, but if it is in small slices interspersed with live objects then no individual piece might be large enough to satisfy a particular allocation.
The solution to this is compaction. In theory, the compactor slides all the live objects together to one end of the heap, leaving a single contiguous block of free space. This is an expensive operation because every live object might be moved, and every pointer to a moved object must be updated to the new location. As a result, compaction is generally only done when it appears to be necessary. Compaction can also be done in parallel, but it results in a less efficient packing of the live objects -- instead of a single block of free space, several smaller ones might be created.
Figure 2. Heap fragmentation
The concurrent collector: optavgpause
For applications that are willing to trade some overall throughput for
shorter pauses, a different policy is available. The
-Xgcpolicy:optavgpause) attempts to do as much GC work as
possible before stopping the application, leading to shorter pauses
(Figure 3). The same mark-sweep-compact collector is used, but much of the
mark and sweep phases can be done as the application runs. Based on the
program's allocation rate, the system attempts to predict when the next
garbage collection will be required. When this threshold approaches, a
concurrent GC begins. As application threads allocate
objects, they will occasionally be asked to do a small amount of GC work
before their allocation is fulfilled. The more allocations a thread does,
the more it will be asked to help out. Meanwhile, one or more background
GC threads will use idle cycles to get additional work done. Once all the
concurrent work is done, or if free memory is exhausted ahead of schedule,
the application is halted and the collection is completed. This pause is
generally short, unless a compaction is required. Because compaction
requires moving and updating live objects, it cannot be done concurrently.
Figure 3. Application and collector CPU usage: optavgpause
The generational collection: gencon
It has long been observed that the majority of objects created are only
used for a short period of time. This is the result of both programming
techniques and the type of application. Many common Java idioms create
helper objects that are quickly discarded; for example
StringBuffer/StringBuilder objects, or Iterator objects.
These are allocated to accomplish a specific task, and are rarely needed
afterwards. On a larger scale, applications that are transactional in
nature also tend to create groups of objects that are used and discarded
together. Once a reply to a database query has been returned, then the
reply, the intermediate state, and the query itself are no longer needed.
This observation lead to the development of generational garbage collectors. The idea is to divide the heap up into different areas, and collect these areas at different rates. New objects are allocated out of one such area, called the nursery (or newspace). Since most objects in this area will become garbage quickly, collecting it offers the best chance to recover memory. Once an object has survived for a while, it is moved into a different area, called tenure (or oldspace). These objects are less likely to become garbage, so the collector examines them much less frequently. For the right sort of workload the result is collections that are faster and more efficient since less memory is examined, and a higher percentage of examined objects are reclaimed. Faster collections mean shorter pauses, and thus better application responsiveness.
IBM's gencon policy (
offers a generational GC ("gen-") on top of the concurrent one described
above ("-con"). The tenure space is collected as described above, while
the nursery space uses a copying collector. This
algorithm works by further subdividing the nursery area into
allocate and survivor spaces (Figure
4). New objects are placed in allocate space until its free space has been
exhausted. The application is then halted, and any live objects in
allocate are copied into survivor. The two spaces then swap roles; that
is, survivor becomes allocate, and the application is resumed. If an
object has survived for a number of these copies, it is moved into the
tenure area instead.
Figure 4. Gencon in action
In theory, this means that half of the nursery space (that is, the survivor space) is unused at any point in time. In practice, the amount of memory set aside for survivor space is adjusted on the fly depending on the percentage of objects that survive each collection. If the majority of new objects are getting collected -- which is the expected case -- then the dividing line between allocate and survivor is tilted, increasing the amount of data that can be allocated before a collection is required.
This style of collector has an important benefit: by moving live objects
with each collection, the nursery area is implicitly compacted with each
collect. This results in the largest possible block of free space, but
also tends to move objects that are closely related (for example, a
String and its
char data) into nearby memory
locations. This improves the performance characteristics of the system
memory cache, and, in turn, the application itself.
The cost of a nursery garbage collection is relative to the amount of data that survives (Figure 5). Because the expectation is that most objects will be garbage, a nursery collection generally results in a very brief pause. While the majority of objects should be collected quickly, some will not. This means that over time, the tenure area will fill up with long-lived objects and a garbage collection of the entire heap will be required. Most of the techniques described above for the concurrent collector still apply here. Marking of the tenure area will run concurrently as required while allocations and collections happen in the nursery. Sweeping of the tenure area is not done concurrently under gencon, but as part of the main tenure collection.
Figure 5. Application and collector CPU usage: gencon
The region-based collector: balanced
A new garbage collection policy has been added in WebSphere Application
Server V8. This policy, called balanced
-Xgcpolicy:balanced), expands on the notion of having
different areas of the heap. It divides the heap into a large number of
regions, which can be dealt with individually. The details of region-based
garbage collection in general, and the balanced policy in particular, will
be discussed in Part 2.
Tuning heap settings for non-generational collectors
The first step to tuning the heap size for any application is to run your
application using the default heap settings, which will enable you to
gauge the out-of-the-box performance. At this point, if the heap is
consistently less than 40% free or the GC pauses are taking more than 10%
of the total run time, you should consider increasing the heap size. The
minimum and maximum heap sizes can be modified by specifying
The time spent in GC pauses for the mark and sweep phases of a garbage collection are based on the number of live objects on the heap. As you increase the heap size on a consistent workload, the mark and sweep phases will continue to take approximately the same length of time. So, by increasing the heap size the interval between GC pauses will increase, which will give the application more time to execute.
If the GC is performing compaction phases due to fragmentation issues, increasing the heap size might help alleviate the long pauses due to compaction. Compaction phases tend to significantly increase GC pause times, so if they are a regular occurrence tuning the heap settings can improve the application performance.
Fixed vs variable sized heap
Using a variable size heap enables the GC to use only the OS resources required by the application for the heap. As the application heap requirements change, the GC can respond by expanding or contracting the heap. The GC can only contract contiguous blocks of memory from the end of the heap, so a compaction might be required to contract the heap. The actual contraction and expansion phases are very quick and will not noticeably increase the GC pause times. By setting the maximum heap size larger than required for normal operations, the application will be able to handle extra workloads by expanding the heap.
Applications with consistent heap requirements might see GC pause time improvements by using a fixed heap size.
Tuning generational GC
When tuning for generational garbage collection, the simplest approach is to treat the nursery space as a new Java heap area, in addition to the Java heap area used in the non-generational case. The Java heap for the non-generational case therefore becomes the tenured heap.
This approach is conservative: the expectation is that the occupancy of the tenured heap will drop as a result of introducing the nursery, but it provides a safe starting point, especially in the case of a migration from a non-generational policy. When the occupancy of the tenure heap after global (full) collections can be monitored, the size can then be adjusted as described earlier:
-Xmn<size>sets the initial and maximum size of the nursery, effectively setting both
-Xmns<size>sets the initial size of the nursery to the specified value.
-Xmnx<size>sets the maximum size of the nursery to the specified value.
The size of the nursery heap should be fixed, and as a result only one of
-Xmn is required. Therefore, you only need to
understand how to correctly size the nursery heap.
Sizing the nursery heap
To correctly size the nursery, you first need to consider the mechanism that nursery collections use and the secondary characteristics that occur as a result:
- Nursery collections work by copying data from allocate to survivor spaces. Copying data is a relatively expensive, time consuming task. As a result, the duration of a nursery collect is dominated by the amount of data being copied. That isn't to say that there is no effect from the number of objects being copied and the size of the nursery itself, but that these are relatively minor in comparison to the cost of copying the actual data. As a result, the duration of the nursery collect is proportional to the amount of data being copied.
- Only a finite and fixed amount of data is "live" in any given collection. Once an application has completed startup and fully populated its caches, and so on, the amount of "live" data that needs to be copied in the nursery heap is fixed by the amount of work that is being done at that point in time. In a system that processes transactions, the amount of live data that needs to be copied will be equivalent to one set of live transactions. For example, if you have configured your application server with 50 WebContainer threads enabling 50 concurrent transactions to occur, then the amount of live data will be that associated with those 50 transactions.
This means that the duration of the nursery collect is set by the size of the data associated with the number of concurrent transactions occurring at the time of the collect, and not the size of the nursery. This also means that as the size of the nursery is made larger, there is an increase in time between nursery collects, without an increase in the duration of the collect. In effect, as the nursery is made larger, the overall time spent in garbage collection drops.
Figure 6 shows that if the size of the nursery is less than the live data associated with one set of transactions, and thereby the time between nursery collects is less than one transaction, then data has to be copied multiple times.
Figure 6. Average number of times data is copied versus time between nursery collects
As the nursery size is expanded and the time between nursery collects increases, you do less copying on average, and the overhead of garbage collection drops.
Limitations on the nursery heap size
There are no direct limitations on the size of the nursery heap that are imposed by the IBM garbage collector or JVM; in fact, there are cases where the nursery is being set to sizes in the order of 10s and even 100s of gigabytes. There are, however, limitations imposed by the operating system that the Java process has to adhere to in terms of virtual memory and process address space, as well as the availability of sufficient physical memory (RAM). The operating system restrictions for each platform for a 32bit process are shown in Figure 7.
Figure 7. 32bit address spaces by operating system
The restrictions on 64bit process are much, much larger. With addressable memory in the range of hundreds to billions of gigabytes, it is the available physical memory (RAM) limitation that becomes much more important.
Putting the two together
As discussed above, the simplest approach is to treat the nursery as an
additional memory space. However, both the nursery and tenured heaps are
in fact allocated as a single continuous segment of memory, the size of
which is controlled by the
-Xmx setting. If only the
-Xmx setting is used, 25% of the
-Xmx value is
used for the maximum nursery size, and the size of the nursery is
permitted to grow and shrink within that 25%. This gives the layout for
the Java heap shown in Figure 8.
Figure 8. Default heap layout
You should, however, fix the nursery size at a large value to minimise time spent in garbage collection, and enable the tenured heap to resize itself according to occupancy to build in resilience. Therefore, the preferred layout for the Java heap is as shown in Figure 9.
Figure 9. Recommended heap layout
In order to achieve this layout, the individual values for the minimum and maximum heap size for both the nursery and tenured spaces should be set, with minimum and maximum nursery size settings equal to each other, and the minimum and maximum tenured space sizes set to different values.
For example, if you want to have a 256MB nursery heap size, and a tenured heap between 756MB and 1024MB, the values would be:
-Xmns256M -Xmnx256M -Xmos756M -Xmox1024M
Migrating to generational
Since the default GC policy has changed from optthruput to gencon in
WebSphere Application Server V8, previously chosen tuning parameters might
need to be adjusted. The primary issue is changing the heap sizes to
compensate for the nursery. A program that previously ran fine under
optthruput with 1G of heap (i.e.
-Xmx1G) might be unhappy
running with only 768M of tenure space and 256M of nursery space. The
techniques described above will help to choose new heap parameters.
There are other less obvious situations where gencon may display different behaviour.
Because classes are generally long-lived objects, they are allocated directly into tenure space. As a result, class unloading can only be done as part of a tenure collection. If an application relies heavily on short-lived class loaders, and nursery collections can keep up with any other allocated objects, then tenure collections might not happen very frequently. This means that the number of classes and class loaders will continue increasing, which can increase the pressure on native memory and lead to very long tenure collections when they do happen, because there is so much class unloading work to be done.
If this issue becomes a problem, there are two solutions. The first is to
encourage additional tenure collections in the presence of large amounts
of class loaders. The command line option
-Xgc:classUnloadingKickoffThreshold=<number> tells the
system that a concurrent tenure collection be started every time
<number> new class loaders have been created. So, for
will start a concurrent tenure collect whenever a nursery collect notices
that 100 new class loaders have been created since the last tenure
collection. The second solution is to change to one of the other GC
A similar issue can arise with reference objects (for example, subclasses
java.lang.ref.Reference) and objects with
finalize() methods. If one of these objects survives long
enough to be moved into tenure space before becoming unreachable, it could
be a long time before a tenure collection runs and "realizes" that the
object is dead. This can become a problem if these objects are holding on
to large or scarce native resources. We've dubbed this an "iceberg"
object: it takes up a small amount of Java heap, but below the surface
lurks a large native resource invisible to the garbage collector. As with
real icebergs, the best tactic is to steer clear of the problem wherever
possible. Even with one of the other GC policies, there is no guarantee
that a finalizable object will be detected as unreachable and have its
finalizer run in a timely fashion. If scarce resources are being managed,
manually releasing them wherever possible is always the best strategy.
The default policy should provide adequate performance for most workloads, but it may not be the ideal choice for a particular application.
An application that operates like a "batch job" sets up its initial state and loads the data on which it will operate. Most of these objects will live for the duration of the job, with only a few additional objects being created as the job runs. This sort of workload fits the optthruput model since the expectation is that there will be very little garbage until the task is complete. In a very similar case, jobs that complete very quickly or allocate very few objects might be able to run without requiring a garbage collection with an appropriate sized heap. In these cases, the minimal overhead of the optthruput collector makes a good choice.
By contrast, a transactional application is continually creating and discarding groups of objects. In this context, the term "transaction" can be very literal -- consider a database update or an e-commerce purchase -- or used in the far broader sense of a discrete unit of work. To take some examples, serving a web page can be considered a transaction. The client submits a URL, the server computes the content of the page and sends it to the client. Once the client has received the page, the server can discard the computed data.
To stretch the definition a little further, consider a standard user interface. The user clicks on the Save button and the system opens a file dialog so the user can navigate the filesystem and choose a location for the document. Once the user has dismissed the dialog, all of that intermediate state becomes unnecessary. Even some batch jobs are really transactional under the surface. A task that is creating thumbnails for a collection of large image files might appear to be a single large batch job, but internally it is processing images separately, each one forming a transaction of sorts. For any of these kinds of workloads, the gencon model should provide benefits.
The optavgpause model sits somewhere in the middle. Applications that have a lot of long-lived data that changes slowly as the program runs might fit this model. This is an unusual pattern for a workload; generally, long lived data is either almost never changed or changed frequently. That said, a system with a slowly evolving data set that does not create many intermediate objects mighty benefit from this policy. Programs that are unable to run efficiently under gencon due to one of the problems discussed above might benefit from the concurrent nature of optavgpause.
This article presented a brief description of the garbage collections strategies available in the Java Virtual Machine in WebSphere Application Server V8. While the default setting should perform well in most cases, some tuning might be required to get the best performance. By matching the GC policy used to the type of workload and choosing suitable heap parameters, the impact of garbage collection on the application can be reduced.
Part 2 of this series will introduce a new region-based garbage collection strategy, called balanced, which is designed to improve scalability when deploying on large 64bit multi-core systems. The article will cover the motivation behind the new technology and the performance improvements it provides, as well as hints and tips on tuning this new option.
- Garbage collection definition on Wikipedia
- Java technology, IBM style: Garbage collection policies, Part 1
- Java technology, IBM style: Garbage collection policies, Part 2
- Download the IBM SDKs for Java
- WebSphere Application Server V8 product information