When using IBM Java in 64-bit mode and with a maximum heap size less than 25GB, then Compressed References technology (-Xcompressedrefs) is enabled by default (defaults may be different on older versions of Java on some operating systems): http://pic.dhe.ibm.com/infocenter/java7sdk/v7r0/topic/com.ibm.java.zos.71.doc/diag/appendixes/cmdline/Xcompressedrefs.html
This option will "decrease the size of Java objects and make more effective use of the available space. The result is less frequent garbage collection and improved memory cache utilization." (http://pic.dhe.ibm.com/infocenter/java7sdk/v7r0/topic/com.ibm.java.lnx.71.doc/user/garbage_compressed_refs.html)
There are important implications to compressed references related to native OutOfMemoryErrors:
When you are using compressed references, the following structures are allocated in the lowest 4 GB of the address space: Classes, Threads, Monitors. Additionally, the operating system and native libraries use some of this address space. Small Java heaps are also allocated in the lowest 4 GB of the address space. Larger Java heaps are allocated higher in the address space.
Native memory OutOfMemoryError exceptions might occur when using compressed references if the lowest 4 GB of address space becomes full, particularly when loading classes, starting threads, or using monitors. You can often resolve these errors with a larger -Xmx option to put the Java heap higher in the address space.
A command-line option can be used with -Xcompressedrefs to allocate the heap you specify with the -Xmx option, in a memory range of your choice. This option is -Xgc:preferredHeapBase=<address>, where <address> is the base memory address for the heap. In the following example, the heap is located at the 4GB mark, leaving the lowest 4GB of address space for use by other processes. -Xgc:preferredHeapBase=0x100000000
The first key point is that some maximum heap sizes below 4GB may cause the Java heap to be placed in the 0-4GB address space range (when possible). Compressed references technology works by compressing and decompressing pointers at runtime using bit shift arithmetic (ftp://public.dhe.ibm.com/software/webserver/appserv/was/WAS_V7_64-bit_performance.pdf). However, if the Java heap can be fit under 4GB, then these extra instructions are not required. In one benchmark, when the Java heap moved above the 0-4GB range, there was a relative throughput decrease of ~2.5% (ftp://public.dhe.ibm.com/software/webserver/appserv/was/WAS_V7_64-bit_performance.pdf) -- Note that this 2.5% effect was not under ceteris paribus conditions because the heap size was increased rather than using -Xgc:preferredHeapBase. The purpose of using -Xgc:preferredHeapBase (or alternatively, increasing the maximum heap size) is that you are forcing the JVM to take this performance hit in order to give more space to the native class, thread, and monitor data structures to avoid Native OutOfMemoryErrors (NOOMs).
The second key point is that native class, thread, and monitor data structures must all be allocated below 4GB when using compressed references. The operating system and other native allocations may further limit the available space under 4GB, so if you continue to get native OutOfMemoryErrors even with the Java heap allocated above the 0-4GB range, then you must address the number and size of the class, thread, and monitor data structures. In many cases, this is caused by a class, classloader, or thread leak which you can investigate with various tools, but it's easiest to start off by analyzing the javacore from the NOOM. If there are no leaks, then there may be other ways to reduce these data structures such as reducing reflection inflation, using shared classes, etc (see http://www-01.ibm.com/support/docview.wss?uid=swg27039764&aid=1).
One option to avoid these problems and NOOMs is to disable compressed references entirely; however, some benchmarks show a 10-20% relative throughput decrease when doing so: "Analysis shows that a 64-bit application without CR yields only 80-85% of 32-bit throughput but with CR yields 90-95%. Depending on application requirements, CR can improve performance up to 20% over standard 64-bit." (ftp://public.dhe.ibm.com/software/webserver/appserv/was/WAS_V7_64-bit_performance.pdf). You may be able to recover some of this drop by increasing L2/L3 processor cache sizes or efficiency (using processor sets). Disabling compressed references will also dramatically increase Java heap usage by up to 70% (because the pointers are doubled, the same Java object reference takes more of the Java heap).
Update: TechNote published: http://www-01.ibm.com/support/docview.wss?uid=swg21660890