Balancing on Floating Garbage
Thomas_Ireton 100000KRBS Visits (12280)
With more and more folks thinking about using the balanced garbage collection algorithm on their WebSphere application servers (-Xg
The description of balanced garbage collection sounds great. Shorter and more consistent duration garbage collection cycles. Who wouldn't want that! The other garbage collection algorithms can have some very long Garbage Collection (GC) cycles, especially if the heap is large. And heaps are starting to get large, making more consistent GC cycles important. Also, class unloading leaves a lot to be desired in the gencon policy algorithm, so any improvements there can only be good. But, like lunch, you don't get these benefits for free.
The main idea is to only garbage collect some of heap each time a GC cycle is done, rather than collect the entire heap. A Partial Garbage Collection (PGC). The balanced algorithm splits the heap into a bunch of small regions. The regions the JVM decides to garbage collect on a cycle are called the collection set. Only collecting some of the regions will make the cycle quicker, but the cost is that it will also leave objects that are no longer referenced still on the heap, and therefore still using memory. Not only will unused objects in regions that were not in the collection set be left on the heap, if any of these unused objects reference objects in regions in the collection set, those objects will not be garbage collected either, even though they are in the collection set . This is where my new favorite term comes in, floating garbage, objects that are no longer used but are still on the heap, using heap memory, after a garbage collection. This is different from all the previous java garbage collection algorithms, where there would not be any unused objects on the heap after a cycle (in generational, there would not be any unreferenced objects in the nursery after a nursery collection, and no unused objects in the tenured region after a Full GC).
The way to work around floating garbage, is to make the heap larger so it can contain the regular used objects as well as these unused ones. The trick, is to find out how much larger the heap needs to be, basically, how much floating garbage is there going to be. Since there are so many variables used inside the balanced algorithm for how many regions to collect at each partial GC, the age of each region, how often all the regions are looked at during a concurrent mark phase, how many objects from the collection set are referenced from objects outside the collection set etc, there is no real usable formula to find out how much floating garbage there is going to be at any particular time. And without that size, we can't compute a heap size. Different applications and different user load, even different requests, will effect how many objects are being allocated, and how long they stay alive, making finding even the normal heap use almost impossible to compute.
We have to start with an estimate, and then see how it performs. Iterative Feedback. We can iterate the procedure with different heap sizes, until a good value is “tuned in”. Fortunately, the verbose garbage collection trace gives us a lot of information about heap use and lots of details about each garbage collection cycle. We can run load tests that will simulate actual load, then analyze the verbose garbage collection trace to see if the heap needs to be increased, if it is fine, or if it can be reduced. Care has to be taken to also simulate a burst of user activity, as there is no use having a heap that works well most of the time, but performs poorly when there are a lot of users all doing stuff at once.
There is still the concept of a nursery in balanced GC. It is a subset of regions in which new objects are allocated into. These nursery regions (called age 0 regions) are always collected in every partial GC. Other regions can also be in a partial GC, but the age 0 regions are always in the collection set. Because of this you don't want to make the nursery too large, or the partial GC cycles will take too long and you won't see any performance benefit. But, you don't want to make them too small either, or you will get a flood of partial GC cycles. The verbose garbage collection trace will tell us how often the partial GCs are occurring, and how long they are taking, so we can “tune in” a good value to set the nursery to (-Xmn).
If you are currently using gencon on a 64bit JVM, then a good starting point is a max heap size 50% larger than what you are currently using and keep the nursery the same size as it currently is. If you are moving from a 32bit JVM or not currently using gencon, then start with double your current heap size, and a 256m nursery. Then start tuning, tuning, tuning. Remember, these are just starting points! They will need to be modified to ensure you are getting good garbage collection performance.
Also, note that anytime there is a significant application change, or change in user load, you will likely have to repeat the heap size tuning procedure, as the memory use will be different.
You may be thinking that java is already quite the memory pig, and why would you want to waste even more memory? And you may be right. Balanced GC is not for every application. But, if you have a large amount of physical RAM, and are seeing long garbage collection cycles, you may want to throw even more memory at it and try balanced. It can be the old memory/performance trade-off. And besides, memory is getting cheaper all the time.