Calls from other sub-components drives object allocation to one of the allocation interfaces, such as stCacheAlloc, stAllocObject, stAllocArray, or stAllocClass. The allocation interfaces allocate a given amount of storage from the heap, but have different parameters and semantics. The stCacheAlloc routine is specifically designed to deliver optimal allocation performance for small objects. Objects are allocated directly from a thread local allocation buffer that the thread has previously allocated from the heap. A new object is allocated from the end of this cache without the need to grab the heap lock, so it is very efficient. Objects allocated through the stAllocObject and stAllocArray interfaces will, if small enough (currently 512 bytes), also be allocated from the cache.
This section discusses objects and the heap, or storage, aspect of GC.
Figure 1 shows the layout of an object on the heap. The parts of the object are explained below the figure.
Figure 1. Object layout

- size + flags
-
The size + flags slot is 4 bytes on 32 bit architecture and 8 bytes on 64 bit architecture. The main purpose of this slot is to contain the length of the object. As all objects start on an 8 byte boundary, and the size is divisible by 8, the bottom 3 bits are not used for the size; we use them for flags to indicate different states of the object. As the size of objects is limited, the top 2 bits can be used for flags. Note that the mptr, not this slot, is grained on an 8 byte boundary. The flags are as follows:
- Bit 1 is the swapped bit that is only used during compaction. Bit 1 is also the NotYetScanned bit that is only used during mark stack overflow. In addition, Bit 1 is also the multi-pinned bit used to indicate that this object has been pinned multiple times. During a GC, the multi-pinned bit will be removed and restored to allow other uses of this overloaded bit.
- Bit 2 is the dosed bit. The dosed bit is set on if the object is referenced from the stack or registers. The object cannot be moved in this GC cycle; we cannot fix up the reference as it may not be a real reference but simply an integer that happens to have the same value as an object on the heap.
- Bit 3 is the pinned bit. Pinned objects usually cannot be moved because they are referenced from outside the heap. Examples include
ThreadandClassClassobjects. - Bit 31 in 32-bit architecture, or bit 63 in 64-bit architecture, is the flat locked contention (flc) bit used by the locking (LK) component.
- Bit 32 in 32-bit architecture, or bit 64 in 64-bit architecture, is the hashed bit used to denote an object that has returned its hashed value. This is required, because the hash value is the address of the object and you need to maintain this if you move the object.
- mptr
- The mptr slot is 4 bytes on 32-bit architecture and 8 bytes on 64-bit architecture. The mptr slot, not the size + flags, is grained on an 8 byte boundary. The mptr has one of two functions:
- If the mptr slot is not an array, the mptr points to the method block, from which you can get to the class block. The class block tells you which class an object is an instantiation of. The class loader allocates the method block and class block, neither of which are in the heap.
- If the mptr slot is an array, the mptr contains the number of array entries in this object.
- locknflags
- This slot is 4 bytes on 32-bit architecture and 8 bytes on 64-bit architecture, although only the lower 4 bytes are used. Used mainly to contain data for the LK component when locking, the locknflags also has some flags. The following flags are of interest:
- Bit 2 is the array flag. If this bit is on, the object is an array and the mptr field will contain a count of the number of elements in the array.
- Bit 4 is the hashed and moved bit. If this bit is on, it tells us that this object has been moved after it was hashed, and the hash value is in the last slot of the object.
- Object data
- This is where the object data starts, and the layout is object-dependent.
The size + flags, mptr, and locknflags are sometimes collectively known as the header.
The heap is a contiguous piece of storage obtained from the operating system at JVM initialization. Figure 2 shows the layout of the heap.
Figure 2. The heap

Heapbase is the address of the start of the heap, and heaptop is the address of the end of the heap. Heaplimit is the address of the top of the currently used part of the heap and can expand and shrink. The size from heapbase to heaptop is controlled by the -Xmx option. If this option is not specified, the default will apply as follows:
| Java Release 1.2.2 and 1.3.0 | Release 1.3.1 |
|---|---|
| For Windows platforms, half the real storage with a minimum of 16M and a maximum of 2G-1 | For Windows platforms, half the real storage with a minimum of 16M and a maximum of 2G-1 |
| For all other platforms, 64M | For OS/390 and AIX platforms, 64M |
| For Linux platforms, half the real storage with a minimum of 16M and a maximum of 512M-1 |
The initial size from heapbase to heaplimit is controlled by the -Xms option. If this option is not specified, the default will apply as follows:
| Java Release 1.2.2 and 1.3.0 | Release 1.3.1 |
|---|---|
| For Windows platforms, 4M | For Windows, AIX, and Linux platforms, 4M |
| For all other platforms, 1M | For OS/390, 1M |
For the majority of applications, the default settings will work well. The heap expands until it reaches a steady state. The heap then remains in this state, which should give a heap occupancy (or the amount of live data on the heap at any given time) of 70%. At this level, the frequency of GCs and the pause time of GCs should be at an acceptable level.
For some applications, the default settings might not give the best results. Below are some problems that may occur, and some suggested actions to take. You can use verbosegc to help monitor the heap.
| Problem | Suggested action |
|---|---|
| Frequency of GCs is too high until the heap reaches a steady state | Solution: use verbosegc to determine the size of the heap at a steady state and set -Xms to this value. |
| The heap is fully expanded and the occupancy level is greater than 70% | Solution: Increase the -Xmx value so the heap is not more than 70% occupied. For best performance, try to make sure that the heap never pages. The maximum heap size should, if possible, fit in physical memory. |
| At 70% occupancy the frequency of GCs is too great. | Solution: Change the setting of -Xminf. The default is 0.3, which will try to maintain 30% free space by expanding the heap. A setting of 0.4 increases this free space target to 40%, reducing the frequency of GCs. |
| Pause times are too long. | Solution: Try using -Xgcpolicy:optavgpause (introduced in 1.3.1), which reduces pause times and makes them more consistent as the heap occupancy rises. There is a cost to pay in throughput. This cost varies and will be about 5%. |
The following tips work well in practice:
- Make sure the heap never pages; the maximum heap size must fit in physical memory.
- Avoid finalizers -- you can never be guaranteed when a finalizer will run, and often they lead to problems. If you do use finalizers, try to avoid allocating objects in the
finalizermethod. Averbosegctrace shows if finalizers are being called. - Avoid compaction -- a
verbosegctrace will show if compaction is occurring. Compaction is usually caused by requests for large memory allocations. Analyze requests for large memory allocations and avoid them if possible; if they are large arrays, try to split them into smaller pieces.
Figure 3 shows the heap in relation to the allocbits and markbits, which are two-bit vectors that indicate the state of objects on the heap. As all objects on the heap start on an 8-byte boundary, both vectors have 1 bit to represent 8 bytes of the heap, so each of these vectors is one 64th of the heap.
Figure 3. The heap with allocbits and markbits

As objects are allocated in the heap, a bit is set on in allocbits to indicate the start of the object. This process tells us where allocated objects are, but it doesn't tell us whether the object is alive or not. During the mark phase, a bit is set on in markbits to indicate the start of a live object. Figure 4 shows two objects on the heap with allocbits on for both objects. During the mark phase, Object 2 was found to be referenced, but Object 1 was not; so we have a markbit on for Object 2. Object 1 will be collected during the sweep phase.
Figure 4. Some objects in the heap

The system heap was introduced in Release 1.3.0, and contains only objects whose life-expectancy is the life of the JVM. The objects in this heap are the class objects for system and shareable middleware and application classes. The system heap is never garbage-collected because all objects in it will either be reachable during the lifetime of the JVM, or (in the case of shareable application classes) have been selected to be reused during the lifetime of the JVM. Figure 5 shows the system heap layout.
Figure 5. The system heap

The system heap is a chain of noncontiguous pieces of storage. The initial size of the system heap is 128k in 32-bit architecture, and 8M in 64-bit architecture. If this fills, then the system heap will obtain another extent and chain the extents together.
Figure 6 shows the free list chain. The head of the list is in global storage and points to the first free chunk on the heap. Each chunk of free storage has a size field and a pointer that points to the next free chunk. The free chunks are in address order. The last free chunk has a NULL pointer.
Figure 6. The free list

Two types of allocation, heap lock and cache, are discussed in this section.
Heap lock allocation occurs when the allocation request is greater than 512 bytes, or when the allocation will not fit into the existing cache. As the name implies, heap lock allocation requires a lock and should be avoided, if possible, by using the cache. Below is some pseudo code for heap lock allocation in Release 1.3.1.
If size greater than 512 or enough space in cache try cacheAlloc return if OK HEAP_LOCK Do forever If there is a big enough chunk on freelist Take it goto Gotit else manageAllocFailure If any error goto GetOut End do Gotit: Initialise object GetOut: HEAP_UNLOCK |
First check the size of the allocation request, and if it is less than 512 or will fit in the existing cache, try to allocate using cache allocation. If you do not use cache allocation, or if the application failed to find free space, then you will get the HEAP_LOCK. The application now searches the freelist for a chunk of free storage big enough to satisfy the allocation request. If it finds free storage, it takes it and initializes the object, returning any remaining free storage to the freelist. If the remaining free storage is less than 512 bytes plus the header size (12 on 32-bit architecture and 24 on 64-bit architecture), you don't put it on the freelist. These small pieces of storage that are not on the freelist are known as dark matter, and will be recovered when the objects adjacent to them become free or when you compact the heap. If you can't find a big enough chunk of free storage, you now have an allocation failure and need to perform a GC. If the GC creates enough free storage, then you'll search the freelist again and pick up the free chunk. If GC did not find enough free storage, you will return out of memory. The HEAP_LOCK is released after we have either allocated the object or failed to find enough free space.
Hints was introduced in Release 1.3.0. In certain scenarios, in large heaps where the freelist has lots of small free spaces in an application that is allocating a lot of larger allocations, the heap lock allocation scheme has a problem. It always starts at the beginning and searches most of the long list to find a freespace big enough to satisfy an allocation. The quick freelist hint algorithm was introduced to solve this problem.
For all heap lock allocation attempts that walk the freelist, the following data is collected:
- A search count of the number of chunks on the freelist examined before finding a freespace large enough to fit the desired allocation of size n.
- The size of the largest freespace chunk in the freelist preceding the freespace used for allocation is also recorded. (In other words, the largest chunk not large enough to satisfy the request.)
When a freespace capable of satisfying the allocation is found and the search count is larger than 20, you should create a new active hint pointing into the freelist.
Active hints can now be used to start searching the freelist, at a point other than the beginning, depending on the size of the allocation request. Hints are dynamically updated as chunks are allocated from the freelist.
Cache allocation is specifically designed to deliver optimal allocation performance for small objects. Objects are allocated directly from a thread local allocation buffer that the thread has previously allocated from the heap. A new object is allocated from the end of this cache without the need to grab the heap lock, and hence it is very efficient. The criteria for using cache allocation has changed over the releases, as follows:
- 1.2.2 and 1.3.0 -- Use cache allocation if the size of the object is less than 384.
- 1.3.1 -- Use cache allocation if the size of the object is less than 512, or the object will fit in the current cache block.
Figure 7 below shows a cache block on the heap.
Figure 7. A cache block on the heap

The cache block is sometimes called a thread local heap (TLH). When allocating a TLH for a thread, we go through heap locked allocation and reserve a part of the heap that will be used exclusively by a single thread. All cache allocations can then be made into the TLH without the need for any locks. Note that the allocbit is not on for the TLH. We set allocbits for a TLH when the TLH is full or when a GC occurs. To increase the efficiency of allocating a TLH, we always take the next free chunk on the free list up to a maximum size, which has changed with releases as follows:
- 1.2.2 and 1.3.0 -- 6K
- 1.3.1 -- Variable from 2K to 164K, depending on the usage of the TLH
Figure 8 shows some objects that have been allocated in the TLH. Notice that objects are allocated from the back of the TLH, which can be done more efficiently than allocating from the front. There are still no allocbits set; they will be set for all objects in the TLH when the cache is full or when a GC occurs.
Figure 8. Some objects in a cache

Stay tuned for garbage collection
The next article in this series describes how garbage collection works. It reviews the functions of the mark, sweep, compact collector, and how parallel mark and parallel sweep work. The article also discusses how the heap expands and shrinks, and describes the new Concurrent Mark collector introduced in Release 1.3.1.
- Part 2, Garbage collection
- Part 3,
verbosegcand command-line parameters - Garbage Collection - Algorithms for Automatic Dynamic Memory Management, by Richard Jones and Rafael Lins (John Wiley & Son Ltd, 1996, ISBN:0-471-94148-4).
- Read Part 2, Garbage collection (developerWorks, August 2002) and Part 3, verbosegc and command-line parameters (developerWorks, September 2002) in this series.
- Find more Java resources on the developerWorks Java technology zone, which includes links to IBM developer kits and runtime environments.
- Read the introductory article, Living with the Garbage Collector (developerWorks, January 2002).
Sam Borman joined IBM in 1984 after working at seven other companies as a programmer and systems programmer in the UK, New Zealand, and France. He worked as a developer, and later as a development manager, in CICS. In 1990 he returned to a technical career working on CICSPlex/SM and later on DirectTalk. In 1999 he joined the Java Technology Centre where he took ownership of Garbage Collection. You can contact Sam Borman at sambo@uk.ibm.com.
Comments (Undergoing maintenance)

