This is the conclusion of the 5-part series on Java Performance.The first article in the series laid the foundation for performance tuning, and parts 2, 3 and 4 looked at various bottlenecks that can affect a system's scalability and throughput. This article covers two important topics that were not covered previously, along with providing case studies and references.
A frequently asked question (FAQ) is about translating Sun-specific command-line switches to IBM-specific switches. Also, any serious performance tuning exercise, like benchmarks, cannot ignore system-wide tuning. We touch upon these topics briefly in the next section.
This is followed by a few case studies that attempt to illustrate how the tools and tips described in the series are applied to solve problems in the field. The emphasis is on understanding and learning how to use the tools and techniques mentioned in the series.
The article, and the series, concludes with a recap of useful references.
This section talks about translating Sun Java configuration to IBM Java configuration, and System-wide tuning for AIX applications. The scope of both of these topics is quite vast, so we only touch them briefly.
If you have an application that has been tuned for Sun Java, and you are attempting to migrate your application to AIX (or, for that matter, any platform running IBM Java), you may already have done the hard work. Understanding the application characteristics is half the battle. You can use the characteristics-based tuning tips explained in Part 2 and Part 3 based on the understanding obtained by the tuning exercise with Sun Java.
However, we frequently receive queries about how to translate specific Sun Java command-line switches to equivalent IBM Java command-line switches. These switches almost always correspond to Garbage Collection, as a well-tuned GC is essential to any Java-based application's performance. A mapping between Sun and IBM switches is difficult because of the difference in JVM architecture. The IBM Java does not contain a Generational Garbage Collector, and does not understand any command-line switches that start with
-XX. IBM Java "Sovereign" architecture is not based on Sun HotSpot architecture as well. The easiest, and in most cases the quickest, way is to throw away all Sun-specific settings when running your application on IBM Platforms, and carrying out the fine-tuning as needed. But if you are curious about how some Sun switches map to IBM switches, read on.
The table below attempts to translate Sun Java GC command-line switches to equivalent IBM switches. This mapping is based on the functionality of Sun switches as described in the Sun-specific article "Tuning Garbage Collection with the 1.4.2 Java™ Virtual Machine". You should attempt to use this table only for the very specific purpose of locating an equivalent (or close) switch for IBM Java. This is not meant to replace the tuning exercise, as even heap size requirements can be quite different. For general GC tuning tips with IBM Java, as well as for information on these and other GC switches for IBM Java, please refer to Fine-tuning Java garbage collection performance. The creation of this table did not involve any performance-related testing, but was based entirely on the documented use of the Sun switches in the reference quoted above.
|Sun Switch||Equivalent IBM Switch||Notes|
|-Xms, -Xmx||-Xms, -Xmx||These parameters, and their meaning, remain unchanged. You may still need to do heap sizing.|
|None||These switches can simply be removed, as they are for generational GC which doesn't apply for IBM Java.|
|-XX:MinHeapFreeRatio, -XX:MaxHeapFreeRatio||-Xminf, -Xmaxf||Heap expansion/shrinkage is controlled by other factors, not just these switches.|
|-Xverbose:gc, -XX:+PrintGCDetails||-Xverbose:gc||IBM Java verbosegc trace format is quite different from the Sun GC. More detailed tracing can be enabled as needed, but in most cases the default verbosegc traces are sufficient.|
|-XX:+UseParallelGC, -Xincgc, -XX:+AggressiveHeap||None||These are various types of Garbage Collectors supported by Sun. These do not apply to IBM Java.|
|-XX:+UseConcMarkSweepGC -XX:+UseParNewGC||-Xgcpolicy:optavgpause||Concurrent Low-pause collector is close to the IBM Concurrent Mark in intent (but not necessarily in design).|
|-XX:+CMSParallelRemarkEnabled||None||Not applicable for IBM Java.|
|-XX:ParallelGCThreads||-Xgcthreads||It is not advisable to change this setting at least for IBM Java.|
|-Dsun.rmi.dgc.client.gcInterval, -Dsun.rmi.dgc.server.gcInterval||-Dsun.rmi.dgc.client.gcInterval, -Dsun.rmi.dgc.server.gcInterval||See NIO003 in Part 4.|
As the above table demonstrates, translating the switches in most cases will involve simply discarding the Sun switches for IBM platforms. This makes the GC tuning exercise for IBM Java quite painless, while still providing superior performance.
Using AIX tools like
vmo has a system-wide effect, so a thorough coverage of these tools is beyond the scope of the current series. See the Resources section for more information on these tools.
But for most multi-tiered applications and especially benchmarks, system-wide tuning is unavoidable. There are several excellent resources available for AIX performance tuning that you can consider. To get an idea of the kind of tuning normally required, you can look at actual published benchmarks. If you look at a recent SpecJBB 2000 result for IBM Java on AIX, say http://www.spec.org/osg/jbb2000/results/res2003q3/jbb2000-20030624-00194.html, the OS tunings are mentioned below:Operating system tunings
- vmo -r -o lgpg_regions=256 -o lgpg_size=16777216
- setsched -S rr -P 40 -p $$
- schedtune -t 400 -F 1
- vmtune -S 1
Warning: These settings must not be applied without carefully understanding the consequences, as an improper use of these settings can actually worsen the system performance.
So what do the above settings do? Referring to AIX documentation, you can quickly get a better understanding of what each of these settings is doing. Let us examine each of these in turn.
SPINLOOPTIME controls the number of times the system will retry a busy lock before yielding to another process. Since the default is 40, a higher value tells the system that it should try a little bit longer for the lock to be freed up. On multiprocessor systems, it results in better performance since a busy lock retry is cheaper than a process context switch.
vmo line sets up the size and number of large pages. If you look at the Java command-line switches, you will see that
-Xlp is being used. This enables Large page support in Java, which is described in more detail here. If you have a memory intensive application, you can experiment with large pages to see if it helps. More information on this topic is available in the SDK Guide accompanying Java.
setsched line is actually a script, not an AIX command. It calls the
thread_setsched kernel service to select fixed-priority round-robin scheduling, with a fixed priority of 40 for the Java process.
schedtune command is also a script from AIX 5.2 onwards, that maps the passed parameters to the new
schedo command. The above line is changing the time slice for fixed-priority threads to 400 ticks. It is also forcing the fixed priority threads to reside in the global run queue.
vmtune command translates the call to an equivalent vmo call, and the above line enables pinning of shared memory segments.
So you can see that the system was switched to large pages and a fixed priority scheduler, to get record numbers with SpecJBB 2000 benchmark. Can you use these same settings in your application? Probably not, but you are now in a position to examine these commands and the scheduling policies, and experiment to suit your application characteristics. This is the next step in Performance tuning.
In this section we look at a few examples, taken from actual issues handled by Java service team. These examples should give you a good idea of how to approach performance tuning, and how to use various tools to gather information that can be used for tuning exercise. Note that the cases for this section were not chosen based on how frequently the problem is encountered in the field. The emphasis is on understanding how to use the various tools and techniques discussed in the series to locate and correct performance issues.
The reported issue was that the Java-based application's response time was unacceptable. Using
topas, and then with
vmstat, it was seen that Java was the application consuming most CPU. Using tprof, the functions that showed up had GC-related terms in them (e.g. localMark, which is used in Mark phase), so this indicated a possible issue with Java heap sizing.
Looking at GC logs confirmed that the heap was expanding very often. This, combined with multiple allocation failures in quick succession and a very full heap, resulted in the Java application spending a lot of time just trying to locate a free chunk, not finding it, expanding the heap, and then satisfying only the current allocation request.
This was fixed by specifying a larger value for
-Xmine, forcing the heap to grow faster (seeTip MEM003 in Part 3). The result was that a single expansion avoided multiple potential allocation failures.
The first step, using AIX tools, confirmed this to be a Java-related issue. The second step, guided by the fact that AIX tools indicated this to be a problem related with GC, could concentrate on GC logs directly. The third step used the available tuning parameters to break the unusual cycle that the application was getting into, allowing the excessive CPU time to be recovered.
Another interesting scenario was raised as a performance issue, with CPU being busy most of the time. Looking at verbosegc, we could see that a GC cycle was being called a bit too frequently, resulting in the application spending most of its time doing just GC.
The verbosegc traces showed that most of the GC activity was being caused due to multiple allocations of very large objects, roughly 10 MB or more in size. These were fragmenting the heap, making the GC cycles longer and thus affecting the performance. But looking at the verbosegc cycle, the customer could not say what these objects were.
The easiest way to locate the culprit would have been to analyze the heapdump using HeapRoots tool. But there was another twist to the situation: the large objects were not surviving the GC cycle. So the heapdump did not show any objects of such size.
This is a classic example of how profiling can be a very useful ally for locating and correcting problems in application sources. The Java Virtual Machine Profile Interface makes this problem trivial. For this particular example, we used a variation of the method described at Using JVMPI to Identify Large Memory Allocations, and were able to quickly identify the code that was doing this allocation.
As the final case study for this article, we discuss a scenario that showed up looking like a simple sizing problem. An attempt was being made to scale the application to a 1000 users, and the application would run out of Java heap. Calculating the heap requirements based on the number of users, the heap size was increased from 1 GB to 1.5 GB.
But this triggered OOM errors not coming because of Java heap. The Java heap would show enough free space, but the application logs would show that an OOM occurred. Using
svmon, it was seen that only around 4 segments, or 1 GB, were being used for native heap, and the fourth segment seemed to be almost empty (see "Balancing Memory" in "Getting more memory in AIX for your Java applications").
To dig further, the command-line switch
-verbose:jni was added. The extra messages being printed as a result of this switch revealed that the global JNI reference pool was getting exhausted, which is a very rare thing to happen. The global JNI reference pool is large enough to ensure that most normal applications never run even close to exhausting it.
For some time we tried to work around the problem by increasing the number of JNI references (if you specify a higher -Xoss value, it increases the limit in a proportional manner). But this only delayed the inevitable, and the severe Java heap fragmentation caused by the large number of pinned JNI references did not help either.
A closer look at the application design revealed the true cause: an unbounded number of threads being created by the application. As the tests proceeded, the threads would wait for finalizers, and since finalizers are not predictable, there would be a large number of these threads waiting to release their JNI references. The only feasible solution in this case was to change the application code to correct these two problems. Once the application replaced the unbounded threads with a thread pool, and replaced finalizers wherever possible, the sizing work completed with flying colors.
This case shows how you can sometimes end up trying to hit a moving target. An issue reported as Java heap exhaustion eventually turned out to be a design problem. Balancing the Java and Native heaps is usually a critical part of performance tuning, but in this case it was not sufficient. Having so many tools and techniques at your disposal gives you a much broader picture, allowing you to make informed decisions about what to tune.
This article concludes the series. We hope you find this series a valuable guide in maximizing the performance of your Java applications on AIX.
The authors thank Ashok Ambati, Rajesh Jeyapaul, Sharad Ballal, Roger Leuckie and Mark Bluemel for their input and advice on these articles. A special note of thanks goes to John Tesch, whose collection of information on AIX Java performance was a major source of inspiration to the series.
- Read other parts in the Maximizing Java performance on AIX series:
- IBM developer kits for AIX, Java technology edition at http://www.ibm.com/developerworks/java/jdk/aix/service.html
- IBM developer kits - diagnosis documentation at http://www.ibm.com/developerworks/java/jdk/diagnosis/
- AIX Performance PMR Data Collection Tools at ftp://ftp.software.ibm.com/aix/tools/perftools/perfpmr/
- AIX 5L Performance Tools Handbook at http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/SG246039.html
- Understanding IBM eServer pSeries Performance and Sizing at http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/SG244810.html
- Fine-tuning Java garbage collection performance at http://www.ibm.com/developerworks/library/i-gctroub/
- Getting more memory in AIX for your Java applications at http://www.ibm.com/developerworks/eserver/articles/aix4java1.html
- AIX 5.2 performance tools update, Part 1 at http://www.ibm.com/developerworks/eserver/articles/Keung_AIXPerf.html
- AIX 5.2 Performance Tools update, Part 2 at http://www.ibm.com/developerworks/eserver/articles/AIX5.2PerfTools.html
- AIX 5.2 Performance Tools update: Part 3 at http://www.ibm.com/developerworks/eserver/articles/AIX5.2_performancetoolsupdatepart3.html
Amit Mathur works in the IBM Solutions Development group, working primarily with IBM ISVs in enablement/performance of their apps on IBM eServer platforms and providing self-sufficiency to ISVs and customers by providing education and articles on developer works. Amit has more than fourteen years' experience working in Leading software support and development in C/C++, Java and databases on UNIX and Linux platforms. He holds a Bachelor of Engineering degree in Electronics and Telecommunication from India. You can reach Amit at email@example.com.
Sumit Chawla leads the Java Enablement initiative for IBM eServer (for AIX, Windows, and Linux platforms), assisting Independent Software Vendors for IBM Servers. Sumit has a Master of Science degree in Computer Science, with almost 10 years of experience in the IT industry, and is certified by IBM as an Application Architect. He is a frequent contributor to the developerWorks eServer zone. You can contact him at firstname.lastname@example.org.