 | Level: Advanced Dana Triplett (dana@orange-atelier.com), Freelance writer
01 Dec 2001 IBM's contributions to improvements in Java performance are the result of the combined efforts of research teams around the world. Working in research and development labs in Austin, Haifa, Hawthorne, Hursley, Poughkeepsie, Rochester, Tokyo, and Toronto, these teams are currently tackling new challenges that spring from evolving e-business and Web services technologies. In this interview, IBM Distinguished Engineer Robert Berry addresses the rapidly growing importance of the server environment, the need to integrate the JVM with middleware, and the advent of very large configurations in hardware, and how all of these factors are making new demands on Java performance.
Robert Berry joined IBM in 1987, a few years after receiving a Ph.D. in Computer Sciences from the University of Texas in Austin. He has been a part of performance-related research and development from his first years with the company, and became actively involved with Java performance around the time the Java language first emerged. He was appointed an IBM Distinguished Engineer in 2000, and is currently on assignment at IBM's Software Development Laboratory in Hursley, England, helping to grow the Java performance team. In
this interview, I asked him about the process of refining Java performance, the history of performance improvements, and some of the current, hot areas of Java performance research at IBM. (Also, be sure to check out the sidebar "Evolving performance of the Java Virtual Machine," a timeline of performance improvements.) Setting the performance agenda
Triplett: When working to resolve JVM performance problems, how do you and your colleagues agree on a target for performance?
Berry: This is a hard, important, and varied problem. It is not appropriate for
the performance team alone to decide what acceptable levels of performance might
be. Ideally, our customers tell us what they need, [so] the targets are developed outside of the performance group. They are based on a variety of objectives. A prime objective is to ensure that any release of the JVM delivers improved performance over prior releases (we are committed to ensuring that our customers do not experience a performance degradation from one release to the next). So, we have a key set of measurements, based on a combination of standard and internal benchmarks, that we track from release to release. A further and sometimes more stringent driver for objectives comes from competitive implementations. For example, for several years we pursued performance leadership with our Windows JVM implementation. This meant that as new SPECjvm98 numbers arrived from competing platforms and JVM vendors, we worked to ensure that our JVM (when coupled with the appropriate
Intel hardware) delivered superior results. In general, this effort served the Java community
rather well because it continually raised the bar on core JVM performance. Another benchmark, SPECjbb2000, is used by many vendors (IBM, Hewlett-Packard, Sun) in the high-end server space as a key indicator of server competitiveness. To some extent this is a measure of the maturity of Java. For certain platforms, our performance targets are strongly influenced by competitive publication of SPECjbb2000 scores. A recent press release highlights our world-record results on SPECjbb2000. For new technology (for example, the Reset mode, or our new garbage-collection policy for reduced pause times in JVM 1.3.1), we arrive at a target based on what drove the technology in the first place. For example, with JVM 1.3.0 we knew that the cost to start a JVM from scratch was many tens of millions of instructions. The resettable mode was introduced to avoid this cost and allow a clean JVM to be presented to each transaction. So we knew we had to effectively trim that JVM creation path down to some small percentage of a single transaction. In the
case of our new garbage-collection policy we had requirements from our customers that said sustained multisecond pause times were unsatisfactory. When we release a JVM implementation on a new platform, we have a blank slate to
work with -- but again there is usually some near-reference based on other platforms with similar processing power or equivalent operating system (OS).
The role of industry benchmarks
Triplett: How helpful are the current industry benchmarks (such as VolanoMark and SPEC) to you in your research?
Berry: These have been enormously helpful. VolanoMark is based on the real-world software that implements the Volano chat server, and it has been very helpful to us in looking at the performance of applications employing large-scale threading and heavy use of sockets. The former characteristic has stretched JVMs on certain platforms and helped to reveal certain underlying OS constraints such as threading limits, underlying scheduling problems, and other
resource limits that are thread-related. The latter has been important in ensuring that we deliver TCP/IP functionality through the JVM interfaces with as little overhead as possible. IBM OS platforms with exceptional threading and TCP/IP implementations have tended to do very well with this benchmark, especially our AIX/PowerPC offerings. The SPECjvm98 benchmark continues to be valuable for exercising the JIT's ability to generate good code in a variety of situations. SPECjbb2000, loosely based on the famous TPCC benchmark for transaction throughput, has been very helpful in driving our work. It stresses threading (though not to the same degree as Volano), heavy object allocation, and a significant amount of synchronization (locking). As we have worked to address these areas of opportunity, more and more of the time is now being spent in JITted code, and so this benchmark is now very helpful there as well. We have also developed internal variants of this benchmark to exercise our heap management software (garbage collection); these variants stress the heap much more than SPECjbb2000 and allow us to explore the resilience of our existing garbage-collection policies, as well as develop and refine new policies for applications driving very large heaps and heaps that are particularly full. There is a new benchmark emerging, ECPerf, that we expect to be another significant influence in the future. This benchmark exercises Java in the context of a more complete end-to-end scenario. For example, it introduces the relatively new technologies of Enterprise JavaBeans, which have not otherwise been a part of Java benchmarks. It is not a simple benchmark to
configure and execute, but it will be influential.
Measuring up in the real world
Triplett: In addition to VolanoMark, what kinds of real-world applications are used
to measure and improve Java performance?
Berry: We regularly receive code from our customers and this has been very helpful in identifying areas where attention is needed. Another frequent source of input is user-written microbenchmarks. These are small tests that have been written to isolate a particular type of behavior (for example, time to allocate an object of a certain size). On certain competitive occasions, we find that our JVM technology (or more correctly, our entire middleware and server
platform) is being compared directly against a competitor's offering. It is not uncommon for these cases to include some small set of microbenchmarks that are used to help distinguish one JVM implementation from another. Although these are not complete applications, they tend to be developed from abstracting out key known or anticipated behaviors (for example, significant use of sockets or heavy use of locking).
What takes priority?
Triplett: Is it possible to prioritize your research group's goals for performance improvements in terms of say, performance relative to scalability, performance relative to security, or performance relative to maintainability, speed, size, etc.?
Berry: In general, the focus for JVM performance has been toward higher levels
of performance (throughput) and scalability -- but every situation is unique and offers different tradeoffs. The OS/390 area, with its legacy of mission-critical software, brings significant focus on qualities of service such as reliability and isolation. Other transactional environments (for example, Servlets, JSPs) do not have the same history of demands; indeed their focus is more high performance-oriented. There is a sense that in the future, these newer environments may mature in their demands for higher reliability;
indeed, as more and more business is based on this middleware, the implications of failure will
grow.
Security versus performance
Triplett: How do security requirements, especially in the development of enterprise applications, affect performance? What can we do about it?
Berry: Security continues to be a challenge for performance. (The converse is
true as well, of course.) In the early days of Java 2, we explored the cost of security from several perspectives. One finding was that security checks on file access required that file names be converted to a particular standard format. This was very expensive. (Editor's Note: This problem and its resolution is detailed in the sidebar "Evolving performance of the Java Virtual Machine.") We also found that the installation of a security manager imposed a significant overhead (greater than 30 percent) to the execution of an important benchmark. In the Resettable mode JVM we considered the use of a security manager to implement some of the constraints we needed to enforce (for example, loading of native libraries and access to the file system). However, the cost of this approach was considered too expensive, so a different, JVM-internal mechanism was invented to avoid this overhead. So, I think the net here is that we know it is expensive, and we expect to be doing more work in this area. We need more customer experience with their use of security policies, and I think it's fair to say we'd welcome any input from our customers on this -- as with any other aspect of IBM JVM performance.
Current areas of research
Triplett: In closing, could you cite which areas of Java performance are currently getting the most attention in the IBM labs?
Berry: There are a number of areas we're currently working on. Scalable JVM
We began work on this a few years ago. It is not driven exclusively by performance issues, but they certainly play a key role. We sought to bring the advantages of Java to our 390 platform customers -- customers with substantial investments in IMS, DB2, CICS. [These are] customers with strong requirements for the combination of high reliability and high performance. These two objectives are not easily jointly satisfied. Two key ideas resulted from this work:
- Resetting the JVM between transactions
- Sharing resources across JVM instances
The former is important in situations demanding isolation between transactions. The latter deals with reducing footprint (memory use) and reducing startup time for JVMs that share resources. [Sharing resources across JVM instances] requires restructuring the JVM to allow for classes that are loaded by one JVM to be used by another, and also for JITted code to be shared. All of this must be done subject to correct operation for each individual JVM (for example, each JVM must still initialize classes it uses in the right order), so sharing is not quite so simple as just pooling all classes and JITted code into one big, shared area. Perhaps the biggest
benefit comes at startup where we don't physically have to load all of the classes that are
needed to get the JVM article up and running. The performance effort surrounding the scalable JVM has been substantial. First, it helped to create and shape the direction of the design and implementation. We assisted in defining reasonable objectives for the first release (to be coupled with the CICS's first deployment of Enterprise JavaBeans support), but we have also been very much involved in the effort to ensure that it delivers on these objectives. This has required some invention in the areas
of instrumentation, measurement, and analysis. Some areas requiring such invention included:
-
Measuring JVM reset time: This was a challenge because reset is meant to be a
very short operation (for example, single milliseconds in duration). It is hard to reliably measure such an event, and even harder to diagnose it if it takes longer than expected.
-
Predicting the benefits of class sharing: This required special instrumentation and tooling to correlate class load and JITting events.
-
Measuring JVM memory use: Memory use has not received much attention in
Server JVM implementations. We sought to gain a better understanding of which aspects of the JVM were contributing to large memory use. This effort was also driven by a need to enhance the benefits to footprint by sharing.
-
Understanding object allocation in the scalable JVM: The design of the scalable JVM delivers transaction isolation through the creation of split heaps, with one heap dedicated to the objects created by a transaction. This design allows for a simpler cleanup (reset) at the end of each transaction because ideally all transactional objects are in a separate heap and can be quickly discarded. This is one area where high performance can be delivered. However, there are many subtle issues in getting this right, and part of the
performance effort was focused on this topic.
64-bit JVMs
IBM has led the field with the development of JVM support for a variety of 64-bit platforms. These have introduced some interesting challenges:
- Some of the 64-bit environments are quite new (for example, Itanium from Intel), and
these bring new challenges to our JIT technology. We have made great progress here, but more is needed to exploit the promise of this new technology.
- 64-bit JVMs mean the opportunity for very, very large heaps. This has driven our work in large-heap garbage collection.
Large heap garbage collection
Garbage collection is an extremely complex area. It has been studied in academia for decades, and there are now many distinct implementations in the field. All of IBM's JVMs employ a variant of the so-called Stop-the-World collector. This collector technology requires the
JVM to stop all active threads (that is, threads running user programs that allocate
objects in the course of their execution) and collect garbage in the heap. This collection activity typically consists of three phases:
- Marking live objects (a live object is an object that is still reachable by program code; that is, it is not garbage).
- Sweeping up the garbage and coalescing it into large areas of free space.
- Compacting the remaining live objects to create a yet larger area of free space.
The downside of this approach is that the larger the heap, the longer the active threads must be suspended. Unfortunately, in a server transactional environment such long pauses may be unacceptable. Because 64-bit environments present the opportunity for maximum heap sizes to be vastly larger than those afforded in 32-bit environments, it is critical that some attention be paid to this challenge. IBM has refined this technology substantially over the past two years, and most recently has been focused on improving the performance of this approach in large heaps. This has driven significant innovation in areas of marking, sweeping, and compacting the heap. It has resulted in a new understanding of heap structure and dynamics. Core JVM performance
We're addressing three concerns here:
- Object allocation
- Monitor (locking) performance
- Character conversion performance
Editor's Note: To learn more about IBM's work in the area of Java performance, be sure to read the sidebar "Evolving performance of the Java Virtual Machine" in which Robert provides a detailed timeline of performance improvements to the JVM, from JDK 1.0.2 through JDK 1.4.
Resources
About the author
Rate this page
|  |