This is the third article in the 5-part series about Java on AIX Performance Tuning. If you have not done so already, we strongly recommend that you review Part 1 of this series before proceeding further.
This article concentrates on tuning that involves various types of memory structures (Java heap, native heap, stacks), and looks at ways to optimize the system for sizing.
You should look at the first section for general tips that apply to most situations. We also provide a quick reference to tools that are useful in memory bottleneck detection/investigation. The next section describes various types of applications and how they can be tuned. This discussion makes use of your knowledge of the application to decide which tips are best for you. The third section describes the various tips. The article concludes with a look at the next article in the series.
This article deals with making your application scale to a larger number of threads, or have a larger heap, or both. You may also want to make your application more "civilized", by forcing it to use resources in a constrained manner. This is especially important in environments where the system is being shared between multiple applications.
When an AIX Java application is attempting to scale, there are several resource bottlenecks that can come into play. Java heap is only one of them, and in most cases you can simply switch to a higher heap size to scale higher. But there are three other memory areas that play important parts in determining the footprint and scalability of a Java application.
The first important area of memory other than Java heap is the native heap. The article "Getting more memory in AIX for your Java applications" describes how to monitor and size native heap. The Java heap is managed by the Garbage Collector, but the Garbage Collector does not do anything to the native heap. So, if you see the native heap increasing steadily, it may be occurring because of, for example, a mismatched JNI allocation.
The second area is the native stack, specified with
-Xss. This is allocated for each thread, and is not based on usage, so be careful before specifying
-Xss2m if you are planning to run a hundred threads; it would consume 200 MB of native memory, 2 MB for each thread. This becomes a limiting factor especially when going for a higher number of threads, and the recommended workaround is to use a smaller value instead of the default. More information on this topic is provided in the SDK Guide accompanying the JVM.
Finally, the third area is the Java stack, which is controlled via
-Xoss. The value specified with
-Xoss is the upper limit, so specifying a larger value does not have as drastic an effect as with
-Xss. Note that because of JIT compilation, you would need to tweak
-Xss for most of your needs, and
-Xoss can usually be left untouched. One significant place where you would need to adjust
-Xoss would be if you are running out of JNI references, for example.
We will not concentrate on any of the above memory areas, since usually a need to tweak them arises as part of debugging, not performance tuning. But if you run into situations where you are running out of resources, you now have three more areas you can examine. You must be aware of these areas especially when you are pushing your system to its limits.
Speaking of limits, you should ensure that the
ulimit settings do not become a bottleneck during sizing exercises. In an ideal world, we would recommend setting some
ulimit values to unlimited, but this needs to be evaluated against the risk of a process consuming all local resources. For a performance tuning exercise, you can start by setting
ulimit values to unlimited, and once you reach the desired goals, you should set these to finite values.
You can check the current
ulimit settings using the
ulimit -a command, and at least the following three commands should be run, as the user account that will launch Java:
ulimit -m unlimited ulimit -d unlimited ulimit -f unlimited
In some cases, you may not be allowed to do the above operation as the user account running Java may not have enough hard limits assigned to it. Please refer to the SDK guides, at IBM developer kits for AIX, Java technology edition, accompanying the JVM for the required
Note that GC-related problems normally show up as CPU-intensive issues, so you should review Part 2 of this series as well, if you are encountering GC-related issues. Also, if RMI is triggering GC, this is discussed in Part 4.
The rest of this section provides quick introduction to some common tools, and how to detect Java-specific problems. For more details please refer to AIX 5L Performance Tools Handbook and Understanding IBM eServer pSeries Performance and Sizing.
vmstat was covered in Part 2 of this series. The important part to learn from the
vmstat output is whether paging is occuring, and if it is being triggered because of the heap size being larger than the available physical memory. Any benefits you get in scalability due to a larger heap will, in most cases, be defeated by the severe performance degradation due to heap paging, so it should be avoided.
svmon is the most useful tool at your disposal when monitoring a Java process, especially native heap. The article "When segments collide" gives examples of how to use
svmon -P <pid> -m to monitor the native heap of a Java process on AIX. But there is another variation,
svmon -P <pid> -m -r, that is very effective in identifying native heap fragmentation. The
-r switch prints the address range in use, so it gives a more accurate view of how much of each segment is in use. As an example, look at the partially edited output below:
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd LPage 10556 java 681613 2316 2461 501080 N Y N Vsid Esid Type Description LPage Inuse Pin Pgsp Virtual 22ac4 9 mmap mapped to sid b1475 - 0 0 - - 21047 8 mmap mapped to sid 30fe5 - 0 0 - - 126a2 a mmap mapped to sid 91072 - 0 0 - - 7908c 7 mmap mapped to sid 6bced - 0 0 - - b2ad6 b mmap mapped to sid b1035 - 0 0 - - b1475 - work - 65536 0 282 65536 30fe5 - work - 65536 0 285 65536 91072 - work - 65536 0 54 65536 6bced - work - 65536 0 261 65536 b1035 - work - 45054 0 0 45054 Addr Range: 0..45055 e0f9f 5 work shmat/mmap - 48284 0 3 48284 19100 3 work shmat/mmap - 46997 0 463 47210 c965a 4 work shmat/mmap - 46835 0 281 46953 7910c 6 work shmat/mmap - 37070 0 0 37070 Addr Range: 0..50453 e801d d work shared library text - 9172 0 0 9220 Addr Range: 0..30861 a0fb7 f work shared library data - 105 0 1 106 Addr Range: 0..2521 21127 2 work process private - 50 2 1 51 Addr Range: 65300..65535 a8535 1 pers code,/dev/q109waslv:81938 - 11 0 - - Addr Range: 0..11
If you have read the article "Getting more memory in AIX for your Java applications", you should be able to tell that the above configuration is using LDR_CNTRL=MAXDATA=0x40000000. The "Inuse" column prints values in 4K pages, so a segment (being 256 MB in size) will have a maximum value of 65536 in this column. This particular application was using a lot of native heap, as the output shows; the reason why no range is being printed for segments 3-5 is because despite the "Inuse" count not being 65536, the segment is fully allocated. Also, if
-r was not being used, one would assume that since for segment 6 the "Inuse" value is 37070, it is only 56% full. But as you can tell, the actual address range in use for segment 6 is 0..50453, or in other words the segment is almost 77% full. This can have a significant effect on the sizing of the application.
Another interesting part to note is that the Java heap fragmentation is not visible through
svmon. Segments 7 through A are shown to be completely utilized (the value of "Inuse" is 65536 for the corresponding SIDs), and segment B is using the first 45056 pages. This just tells you that the heap is around 1100 MB in size, but there are easier ways (namely, looking at the command-line argument to Java!) to find out this information.
Tuning Java heap is covered in Fine-tuning Java Garbage Collection Performance. On Java versions before 1.4, you may need to tweak some environment settings to go beyond 1 GB of heap, and this is explained in "Getting more memory in AIX for your Java applications".
If you learn how to interpret verbosegc output, and how to use
svmon, it should be sufficient for most of the performance monitoring work for any Java application on AIX.
There are several articles available that talk about GC tuning, so you have a lot of information available for you based on how deep you wish to understand this topic. The Diagnostics page also points to a document that summarizes how GC works for IBM Java, and this should be very useful especially for developers. We stick to the characteristics-based tuning format for the following sections, but if you wish to investigate any particular tip much further, you should refer to "IBM Garbage Collection and Memory Allocation techniques" paper accessible at http://www-106.ibm.com/developerworks/java/jdk/diagnosis/.
Now we will look at various characteristics of a typical applications. You should locate the behavior that resembles that of your application (either by design or through observation) and apply the corresponding tips. Unless specifically stated, the term "heap" refers to the Java heap.
For most applications, the decision to use fixed vs. variable heap is easy to make. Any application that has more or less a finite amount of heap requirement can use MEM001, while any application that can get a surge in the heap usage periodically would do better with MEM002. But, if your application grows gradually, or if you are looking at performance impact due to a variable heap size, MEM001 may still be viable. The Diagnosics guides at IBM developer kits - diagnosis documentation have a good section on how to size the Java heap. As "Fine-tuning Java Garbage Collection Performance" mentions, the rule to follow is: allocate as much heap as your application needs, but no more. But before you decide to use MEM001, see the section about Heap Recycling.
If your application heap needs to grow aggressively, MEM003 would be useful. The kind of applications that benefit from aggressive memory growth are the ones that see peaks in memory demands at particular times during the day. If tuned properly, the number of heap expansions will be reduced, since each expansion will grow the heap by a larger amount.
On the other hand, if you would like to control the rate of heap growth, see MEM004. This is useful when you see that the heap expansion is quite large, and would like to control the expansion delta. This kind of a situation is rare, since Java will normally expand the heap based on well-defined rules and will not grow the heap aggressively unless you use MEM003 first. But it is useful to know that MEM004 is available if needed.
If your application heap should only grow but never shrink, you can force that by using MEM005. The GC cycles keep monitoring the heap usage and shrink the heap if they see that the allocated heap is larger than the current needs. After all, that is the idea of using variable-sized heap. But sometimes you may notice that the heap shrinkage is followed by an expansion soon after, due to the increase in heap requirements. In this case, you can ask JVM not to shrink the heap.
If your application generates a lot of temporary objects, MEM001 can help if you can set a small enough heap. The idea is that if your heap grows to 200 MB and then triggers a 200 ms GC cycle, versus if it grows to 1 GB and then triggers a 1500 ms GC cycle, you are much better off working with a smaller heap since the application does not need a larger footprint anyway. This, of course, assumes that the amount of heap needed for long-term allocation is satisfied by the maximum heap size at all times. This tip is the same as CPU012, but we are concentrating on the application footprint now, not just application performance.
Normally a fixed-size heap serves fine, even if the allocations are quite aggressive. But if the size of these temporary objects is usually quite large, the heap can get fragmented quickly, and can lead to spurious OOMs. Another scenario where MEM001 is a bad idea is if fragmentation is being caused due to pinned objects. If objects have to be pinned in the heap, it helps if they are allocated as low in the heap as possible. But Java will not collect the garbage unless it has to, so as long as heap is available, no GC cycle will be triggered, and that can translate to the pinned objects being allocated in a way that can cause fragmentation.
Newer versions of Java do a much better job of managing fragmentation, and new switches are available to tweak the pinned cluster size (see -Xk and -Xp). But if you are facing heap fragmentation, MEM002 may be all you need. In many cases, the heap expansion and shrinkage eliminates the holes in heap better than a compaction cycle does. Heap fragmentation can affect the scalability of an application severely, and MEM002 is a good tweak to use in these cases.
Here are some quick pointers based on verbosegc output. Refer to "Fine-tuning Java Garbage Collection Performance" for more details.
- If you are using MEM002 and observe too much GC at application stabilization, see MEM007. You may also want to try MEM001 and see if that helps.
- If you are observing that the heap is expanding or shrinking too often, you can eliminate both shrinkage and expansion by using MEM001. MEM005 will eliminate any shrinkage of heap.
- If the Mark times in GC cycle are too high, you should try MEM006. This can also be caused due to "Mark Stack Overflow".
- If the GC cycles are not being triggered due to Allocation Failures, use MEM007. Distributed GC calls are not affected by this setting though.
- If there is excessive compaction being indicated in verbosegc, it may point to an inadequately sized heap.
If your application's native code makes several small requests, you may get a performance gain by using MEM008. But the most important part of native heap tweaking is to make sure that each and every native heap allocation is matched with the appropriate deallocation. You can use
svmon to monitor the native heap, and you may want to set IBM_JAVA_MMAP_JAVA_HEAP=true in order to more clearly distinguish between Java and Native heaps. Other than making sure that the application doesn't exhaust the native heap at runtime, there is generally not much performance tuning to be done for the native heap.
The text below refers to command-line arguments to Java (specified before the class/jar file names) as "switches". For example, the line
java -mx2g hello has a single switch,
Create a fixed size Java heap by specifying the same value for both the initial (
-Xms) and maximum (
-Xmx) sizes of Java heap. The value specified should be high enough not to result in an OOM, while low enough not to increase the GC cycle time significantly.
Note: Specifying a fixed size heap means you will not be able to use
-Xminf/-Xmaxf/-Xmine/-Xmaxe for fine-tuning the GC characteristics. Fixed size heaps are also prone to fragmentation in many cases.
Create a variable size Java heap by specifying different values for
-Xmx, or by specifying just
-Xmx. The values specified should be high enough to avoid OOM, but adjusted to avoid excessive shrinkage/expansion of the heap.
Note: Having a variable size heap can have a severe performance effect if not properly tuned.
Use a value for
-Xmine that is higher than the default (1 MB). This will allow the minimum expansion for Java heap to be more aggressive. For example,
-Xmine5m will allow the heap to grow 5 MB (or more, up to
-Xmaxe) at a time.
-Xmine is only one of the several factors that come into play when heap expansion occurs. Refer to the Diagnostics Guides for more information on heap expansion.
Use a value for
-Xmaxe that is different from the default (0). This will force the maximum expansion for Java heap to be kept within the specified limit. For example,
-Xmaxe2m will force the heap to grow no more than 2 MB in one cycle.
-Xmaxe is only one of the several factors that come into play when heap expansion occurs. Refer to the Diagnostics Guides at IBM developer kits - diagnosis documentation for more information on heap expansion.
To disable heap shrinkage, use
-Xmaxf1, which sets the maximum free heap percentage to 100%.
Note: This will force the heap to only grow, and has no use if the heap is fixed in size.
Use the switch
-Xgcpolicy:optavgpause to enable Concurrent Mark.
Note: For CPU-intensive applications, this setting may have an effect on performance.
Use the switch
-Xdisableexplicitgc to disable any
System.gc() calls from triggering a GC.
Note: Make sure your application functionality is not affected by this switch.
Set the environment variable:
to switch to a buckets-based model for native heap.
Note: Java heap allocation is not affected by this switch. Also, if not properly used, this can affect the performance of the application.
This article has shown you how to use AIX tools for Java performance monitoring, and provided a list of common tweaks that can be applied to optimize the application's memory usage. The next article in the series talks about "Network and Disk I/O tweaking for Java applications on AIX".
- Read other parts in the Maximizing Java performance on AIX series:
IBM developer kits for AIX, Java technology edition at http://www.ibm.com/developerworks/java/jdk/aix/service.html
IBM developer kits - diagnosis documentation at http://www.ibm.com/developerworks/java/jdk/diagnosis/
AIX Performance PMR Data Collection Tools at ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/
AIX 5L Performance Tools Handbook at http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/SG246039.html
Understanding IBM eServer pSeries Performance and Sizing at http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/SG244810.html
Fine-tuning Java garbage collection performance at http://www.ibm.com/developerworks/library/i-gctroub/
Getting more memory in AIX for your Java applications at http://www.ibm.com/developerworks/eserver/articles/aix4java1.html
AIX 5.2 performance tools update, Part 1 at http://www.ibm.com/developerworks/eserver/articles/Keung_AIXPerf.html
AIX 5.2 Performance Tools update, Part 2 at http://www.ibm.com/developerworks/eserver/articles/AIX5.2PerfTools.html
AIX 5.2 Performance Tools update: Part 3 at http://www.ibm.com/developerworks/eserver/articles/AIX5.2_performancetoolsupdatepart3.html
Amit Mathur works in the IBM Solutions Development group, working primarily with IBM ISVs in enablement/performance of their apps on IBM eServer platforms and providing self-sufficiency to ISVs and customers by providing education and articles on developer works. Amit has more than fourteen years' experience working in Leading software support and development in C/C++, Java and databases on UNIX and Linux platforms. He holds a Bachelor of Engineering degree in Electronics and Telecommunication from India. You can reach Amit at email@example.com.
Sumit Chawla leads the Java Enablement initiative for IBM eServer (for AIX, Windows, and Linux platforms), assisting Independent Software Vendors for IBM Servers. Sumit has a Master of Science degree in Computer Science, with almost 10 years of experience in the IT industry, and is certified by IBM as an Application Architect. He is a frequent contributor to the developerWorks eServer zone. You can contact him at firstname.lastname@example.org.