 | Level: Introductory Brian Goetz (brian@quiotix.com), Principal consultant, Quiotix Corp
17 Jun 2003 JavaOne always hosts many interesting technical sessions, but this year's show was overflowing with quality sessions; in fact, several had to be repeated to accommodate the demand. developerWorks columnist Brian Goetz took time out of his schedule to distill some of the more salient concepts from sessions on performance management.
Concurrency Utilities: Multithreading Made Easy
One of the most popular technical sessions was Doug Lea's talk on the util.concurrent concurrency utilities package. This widely used package of concurrency tools is going to form the basis of the java.util.concurrent package in J2SE 1.5, which is being specified by JSR 166.
The goal of the concurrency utilities package is, simply and audaciously, to do for concurrent applications what the collections framework did for data structures. The primitives for concurrency built into the Java language -- synchronization, wait, and notify -- are quite low level and notoriously difficult to use. Let's hope the java.util.concurrent package will contain a sufficiently rich set of concurrency building blocks so that most server-side programmers will never need to use wait and notify at all.
JSR-166, which is specifying the java.util.concurrent package, includes:
- Utility classes for locking
- Thread pools
- Atomic variable management
- Thread coordination
- Task scheduling
- Semaphores
- Mutexes
JSR-166 also contains several new concurrent collections classes, such as LinkedBlockingQueue and ConcurrentHashMap. To make these implementations more efficient and flexible, there are also JVM changes to support high-performance compare-and-swap operations and nanosecond-granularity timing.
Programming Puzzlers
Another popular session was "Programming Puzzlers," where speakers Josh Bloch and Neal Gafter donned Sun-logoed
overalls to play "Click and Hack, the Type-It Brothers." This year, they offered ten programming puzzlers, which illustrated gotchas and common misperceptions pertaining to the use of:
java.math.Random
- The distinction between overriding and overloading
- The Reflection API
- The vagaries of character encoding and local dependence
- Shadowing and hiding of class names
- Order of initialization for static initializer blocks and final fields
- The lexical behavior of the compiler
A good time was had by all, with Click and Hack leaving the audience with their classic closing advice, "Don't code like my brother."
The Black Art of Benchmarking
One excellent session on performance tuning and management was "The Black Art of Benchmarking," given by Timothy Cramer, Tom Marble, and Menasse Zaudou of the Sun performance management group. This talk focused on three areas:
- The difficulty of writing microbenchmarks
- The importance of integrating performance measurement, monitoring, and management into the development and deployment cycle
- How to effectively evaluate benchmark data using statistical methods
Writing microbenchmarks for the Java platform is notoriously difficult, and interpreting the results is even more so. The
timing of JIT compilation (and code invalidation) is nondeterministic, which can significantly bias test results. The HotSpot tuning parameters are not optimized for microbenchmarks. HotSpot is designed to start out slow and get faster, achieving a balance of fast-startup and fast long-term performance, but most microbenchmarks don't ever get to "long term." While it is necessary to "warm up" the JIT before beginning measurement, it's not nearly enough to ensure an accurate benchmark. The reason: the JVM may also choose to "deoptimize" or invalidate JIT-generated code when new classes are loaded, which can make two test loops with identical steady-state performances appear to have dramatically different runtimes, even after the JIT is "warmed up."
Another common error in writing and interpreting benchmarks is that they are not run on a wide enough variety of systems and configurations, and often they are run on the development system (to help make a programming decision) rather than on the target deployment system. This could cause a developer to choose an implementation strategy that has inferior performance characteristics on the deployment system, because the processor, memory, or cache configuration is different from the development system.
Interpreting benchmark results is as difficult as constructing effective benchmarks. Not only must you ensure that your tests run enough iterations of the code being tested, but multiple runs may return different results. It is important to take multiple samples and use statistical methods to determine whether a performance regression or improvement is present. One simple statistical test you can use to determine whether two samples are drawn from the same population (which can help to decide whether a code change had any performance impact) is the "t" test, which can be found in any elementary statistics test. Determining whether a code change made a difference in performance is surprisingly difficult to "eyeball." Using statistical methods is much more reliable.
Finally, the team stressed the importance of integrating performance measurement into the development process, as well as including performance monitoring as part of the deployment process. Performance goals -- whether latency, throughput, peak capacity, average capacity, or memory footprint -- should be part of the requirements process, performance measurement should be part of the testing process, and performance monitoring should be part of the deployment process.
Performance Myths Exposed
Few people know more about JIT internals and low-level Java performance than Dr. Cliff Click (no relation to "Click" of "Click and Hack" fame), who architected the HotSpot JIT. In "Performance Myths Exposed," Dr. Click looked at several common myths about performance, including:
- Uncontended synchronization is slow
- Making methods or classes final significantly assists the JIT in inlining
- Exiting a loop through an exception can improve performance by obviating the end-of-loop test
- Object pooling can significantly improve performance
Dr. Click showed, with credible data, that these old saws no longer apply to modern JVMs, and some of them are in fact downright hazardous to performance.
As stated earlier, writing microbenchmarks to measure the effect of these so-called optimizations is exceedingly difficult, and few people are as qualified to do so as Dr. Click. He analyzed these cases on seven JVMs from IBM, Intel, and Sun -- using both Intel and Sparc platforms.
The most surprising result of his experiments was that most so-called performance hacks don't help, and some can seriously hurt. Clean code that follows common usage patterns generally shows far better behavior on modern JVMs than code laden with tweaks designed to "help" the JIT or garbage collector. More often than not, this well-intentioned assistance has the unfortunate effect of undermining many common JIT optimizations, resulting in slower -- not faster -- code. Because many optimizations involve a trade-off between performance and maintainability, it is best to err on the side of maintainability, unless there is a significant, long-term performance advantage to be gained. Many common performance recommendations only produce a performance improvement on the particular JVM from which they were derived, so tweaked code might see a short-term performance improvement, but with a later JVM, the same code might run slower than the "obvious" version.
Dr. Click also offered a look at some optimizations that current HotSpot compilers do not perform, but which might be included in upcoming versions of the JVM, such as escape analysis (determining that a reference to an object does not escape the local context, in which case it can be allocated on the heap instead, or even not allocated at all, if its fields can be hoisted into registers). It is often surprising just how much optimization can be done by a sophisticated dynamic compiler on code that appears at first glance to be inherently expensive.
Resources
About the author  | |  | Brian Goetz has been a professional software developer for the past 15 years. He is a Principal consultant at Quiotix, a software development and consulting firm located in Los Altos, California. See Brian's published and upcoming articles in popular industry publications. |
Rate this page
|  |