Enhance performance with class sharing

Explore the newest class sharing features in the IBM JRE

The latest release of the IBM® JRE for Java™ SE 6 enhances IBM's class sharing feature first introduced in Version 5. In this article, performance analysts Adam Pilkington and Graham Rawson detail the changes, which include improvements in application startup times and memory utilisation.

Adam Pilkington, Software Engineer, IBM

Photo of Adam PilkingtonAdam Pilkington is a Java Performance Analyst within the IBM Java Technology Centre, focusing on WebSphere Application Server performance with Java 6. Prior to joining IBM in 2006, he was a J2EE technical architect for a large UK financial services organisation. He holds a degree in Mathematics and Computer Science.



Graham Rawson, Software Engineer, IBM

Graham RawsonGraham Rawson is a Java Performance Analyst within the IBM Java Technology Centre and leads the Java Performance team based at the IBM Laboratory in Hursley, England. Graham has been working for IBM for 24 years in a variety of roles supporting CICS Transaction Server and Java technology, both developed at Hursley. Graham holds a B.Sc. (Hons) in Chemistry from the University of East Anglia (Norwich, England) and a Certificate in Software Engineering from the University of Oxford (England).



30 September 2008

Also available in Chinese Russian Japanese

Enhancements to the shared classes infrastructure, first introduced in Version 5 of the IBM JRE for the Java platform SE, improve performance for your Java applications in the areas of startup and memory footprint. In this article, we review the changes and provide examples demonstrating the benefits using Eclipse and Apache Tomcat as our client-and server-side operating environment. We provide some installation instructions so that you can try things out for yourself, but you should be familiar with both applications, as well as with IBM's class sharing. If you are not familiar with IBM's shared classes feature, we recommend that you start with the article "Java Technology, IBM style: Class sharing," which explains the basic concepts involved.

You can download the implementations of the IBM JRE for Java 6 for Linux® and AIX® now if you'd like to follow along when we get to the example. While the Windows® implementation is not available as a separate download, it is provided as a prebuilt Eclipse download. Note that IBM registration (free) is required.

What's new in IBM's shared classes?

IBM JRE for Java 5 gave you the ability to share classes between JVMs with a cache. In IBM JRE for Java 6, you can make the cache persistent and can use it to share compiled code. The method of storing these cached items has been made more efficient as well.

Shared classes

The ability to share classes between Java virtual machines (JVMs) was first introduced in IBM JRE for Java 5 and continues to be supported and enhanced in Java 6. When classes are loaded by the JVM, they can be placed in a cache. When subsequent requests for that class are made, these requests are satisfied from the cache if possible rather than loading the class from a corresponding JAR file.

You can control the maximum size of the cache by using the command line option in Listing 1, but note that this maximum size may be constrained by operating system restrictions on shared memory:

Listing 1. Command line option to set the maximum cache size
Running java -X will show the following option ...

Arguments to the following options are expressed in bytes.
Values suffixed with "k" (kilo) or "m" (mega) will be factored accordingly.
:
-Xscmx<x>       set size of new shared class cache to <x>
:

Ahead of Time (AOT) code storage

A JVM typically compiles Java methods to native code while the program executes. This native code is generated every time the program is run. The IBM JRE for Java 6 SR1 JVM introduces the ability to compile Java methods using Ahead of Time compilation technology to create native code that can not only be used in the current JVM but can also be stored into a shared class cache. Another JVM launched using a shared class cache that has been populated with AOT code during previous invocations of Java programs, can use the AOT code stored in the cache to reduce startup time. This reduction is delivered by saving the time needed for compilation and by faster execution of methods as AOT code. AOT code is native code and generally executes faster than interpreted code (although it is not likely to run as fast as JIT-generated code).

The minimum and maximum amount of the shared class cache that can be occupied by AOT code can be defined using command line options, as shown in Listing 2. If you do not specify a maximum amount of AOT code that can be stored, then the default setting is to use the entire cache. This won't result in the entire cache being filled with AOT code, however, as AOT code can be generated only from classes already in the cache.

Listing 2. Command line option to control the size of cached AOT code
Running java -X will show the following options ...

Arguments to the following options are expressed in bytes.
Values suffixed with "k" (kilo) or "m" (mega) will be factored accordingly.
:
-Xscminaot<x>   set minimum shared classes cache space reserved for AOT data to <x>
-Xscmaxaot<x>   set maximum shared classes cache space allowed for AOT data to <x>

Figure 1 illustrates how the cache space becomes occupied by shared classes and AOT code and how the cache space settings can control how large a share of the total available space is used by these:

Figure 1. Sample shared classes cache composition
Cache composition

Learn more about AOT code shortly.

Class compression

To use the shared classes cache most efficiently, the JVM uses compression technology to increase the number of classes that can be stored. Class compression is done automatically and cannot be changed through the use of command line options.

Persistent caches

The shared classes cache in the IBM JRE for Java 5 was implemented using shared memory segments, which allowed JVMs to share the same cache, but the cache could not persist beyond a reboot of the operating system. This meant that the first JVM that was launched after a reboot had to rebuild the cache. In Java 6, the default implementation of the cache has been changed to use a memory mapped file. This gives cache persistence across an operating system restart.

AOT in detail

The AOT compiler is an additional compilation mechanism that is new in IBM JRE for Java 6. Using previous versions of the IBM JRE, a Java method can be executed either by interpretation of each of the individual Java bytecodes that composes that method or it can be executed as native machine code, compiled, and optimised by a JRE component called the Just-in-Time (JIT) compiler. The JIT compiles code dynamically. The compilation process occurs when the method is actually run, and the compilation techniques employed depend on analysis of the actual method during execution.

What is AOT code?

AOT code is a native code version of a Java method produced by AOT compilation. Unlike JIT compilation, AOT compilation does not employ optimisations based on the dynamic analysis of an executing Java method. Typically, a method will execute faster as AOT-compiled native code than as interpreted Java byte code but not as fast as JIT-compiled native code.

The primary objective of AOT compilation is to accelerate application startup by providing a precompiled version of Java methods. Loading these precompiled AOT methods from the shared class cache is a much faster way to make a native code version of a Java method available for execution than generating JIT-compiled code. Rapid loading of AOT-compiled code allows the JVM to spend less time interpreting Java methods before a native code version is made available. An AOT-compiled method is also subject to JIT compilation, so after initial execution as AOT code, it may be optimised further by the JIT.

AOT as part of shared classes

The generated AOT code is stored in a region of the shared cache. Any other JVM using that shared class cache can subsequently execute that method as AOT code without incurring the cost of compilation.

This implementation differs from the real-time JVM where the compilation of AOT code is performed by a utility (jxeinajar) and stored in a jar file, as described in "Real-time Java, Part 1: Using the Java language for real-time systems."

The AOT code executed by a JVM is not shared, but copied out from the shared class cache. There is no direct footprint benefit because each JVM still has a copy of the AOT executable but there are memory and CPU savings from being able to reuse this code rather than repeat the compilation.

AOT diagnostics

Three command-line settings are available to help you understand what methods have been AOT-compiled by your applications and how much space these methods occupy in the shared class cache:

  • -Xjit:verbose: Use this command to report any AOT compilations performed by the JIT.
  • -Xshareclasses:verboseAOT: Use this command to report any AOT code read from or stored into the shared class cache.
  • java -Xshareclasses:printAllStats: Use this command to list the shared class cache statistics, including AOT code stored and the space occupied.

Listing 3 shows the output from the first invocation of a Tomcat server after clearing the shared classes cache and applying the runtime options -Xjit:verbose and -Xshareclasses:verboseAOT:

Listing 3. Applying -Xjit:verbose and -Xshareclasses:verboseAOT
+ (AOT cold) java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder; 
Storing AOT code for ROMMethod 0x02359850 in shared cache... Succeeded.
+ (AOT cold) sun/misc/URLClassPath$JarLoader.ensureOpen()V @ 0x0147BF9C-0x0147C106 Q_SZ=3
Storing AOT code for ROMMethod 0x023CBFC4 in shared cache... Succeeded.
+ (AOT cold) java/util/jar/JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; 
Storing AOT code for ROMMethod 0x023CE38C in shared cache... Succeeded.

After the initialisation of the Tomcat server, the shared class cache statistics, obtained with the command java -Xshareclasses:printAllStats, showed these methods were stored in the shared class cache (Listing 4 is an excerpt of the full listing):

Listing 4. Using java -Xshareclasses:printAllStats displays the methods stored in the shared class cache
1: 0x43469B8C AOT: append
	for ROMClass java/lang/StringBuilder at 0x42539178.
1: 0x43469634 AOT: ensureOpen
	for ROMClass sun/misc/URLClassPath$JarLoader at 0x425AB758.
1: 0x434693A8 AOT: getEntry
	for ROMClass java/util/jar/JarFile at 0x425ADAD8.

As shown in Listing 5, a subsequent invocation of a Tomcat server using the shared class cache will find these methods already AOT-compiled and will simply load these from the cache rather than repeat the compilation:

Listing 5. Finding and loading AOT-compiled methods
Finding AOT code for ROMMethod 0x02359850 in shared cache... Succeeded.
(AOT load) java/lang/StringBuilder.append(Ljava/lang/String;)Ljava/lang/StringBuilder;
Finding AOT code for ROMMethod 0x023CBFC4 in shared cache... Succeeded.
(AOT load) sun/misc/URLClassPath$JarLoader.ensureOpen()V
Finding AOT code for ROMMethod 0x023CE38C in shared cache... Succeeded.
(AOT load) java/util/jar/JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;

AOT compilation makes heuristic decisions to select candidate methods that will improve future startup time. So it's possible that a future invocation of an application may also cause some additional methods to be AOT-compiled.

A method that has been AOT-compiled may also be JIT-compiled if it meets the necessary recompilation criteria. However, the aim of AOT compilation is to select methods required at application startup and the aim of JIT compilation is to optimise frequently used methods, so it is possible that a method that is AOT-compiled may not be subsequently invoked sufficiently to trigger a JIT compilation.

Listing 6 is an excerpt from the output using -Xjit:verbose for an execution of the SPECjbb2005 benchmark. This excerpt contains a report of AOT compilations of two methods: com/ibm/security/util/ObjectIdentifier.equals and java/math/BigDecimal.multiply. The first is not subject to further JIT compilations, but the more frequently used java/math/BigDecimal.multiply is JIT-compiled twice, eventually to an optimisation level of hot.

SPECjbb2005 doesn't have a lengthy startup phase so only a few AOT methods are compiled. Note that AOT compilations are performed at a cold optimization level, which reflects the overall AOT goal to accelerate application startup.

Listing 6. Optimisations reported when using -Xjit:verbose
+ (AOT cold) com/ibm/security/util/ObjectIdentifier.equals(Ljava/lang/Object;)
Storing AOT code for ROMMethod 0x118B8AF4 in shared cache... Succeeded.

+ (AOT cold) java/math/BigDecimal.multiply(Ljava/math/BigDecimal;)Ljava/math/BigDecimal; 
Storing AOT code for ROMMethod 0x119D3C60 in shared cache... Succeeded.
+ (warm) java/math/BigDecimal.multiply(Ljava/math/BigDecimal;)Ljava/math/BigDecimal; 
+ (hot) java/math/BigDecimal.multiply(Ljava/math/BigDecimal;)Ljava/math/BigDecimal;

The shared class statistics produced by the command java -Xshareclasses:printAllStats lists each AOT-compiled method and each cached shared class. We can use this summary data to understand if the shared class cache is correctly sized. For example, Listing 7 shows that the total cache size of 16776844 bytes was only 40 percent occupied and that 5950936 bytes were from 1668 ROMClasses and 683772 bytes were from 458 AOT-compiled methods:

Listing 7. Cache details
base address       = 0x424DE000
end address        = 0x434D0000
allocation pointer = 0x4295E748

cache size         = 16776844
free bytes         = 9971656
ROMClass bytes = 5950936 AOT bytes = 683772 
Data bytes         = 57428
Metadata bytes     = 113052
Metadata % used    = 1%

# ROMClasses = 1668# AOT Methods = 458
# Classpaths       = 7
# URLs             = 0
# Tokens           = 0
# Stale classes    = 0
% Stale classes    = 0%

Cache is 40% full

Listing 8, the output produced by the command -Xshareclasses:destroyAll, indicates the caches that were destroyed. The command also resulted in the message Could not create the Java virtual machine so don't be alarmed. It does that.

Listing 8. Destroying caches
Attempting to destroy all caches in cacheDir C:\...\javasharedresources\ 
 
JVMSHRC256I Persistent shared cache "eclipse" has been destroyed 
Could not create the Java virtual machine.

Measuring memory usage

There are a number of performance tools you can use to see benefits of shared classes with regard to memory usage. The specific tool you use depends upon the underlying operating system. One thing to remember when you're inspecting memory usage is that the cache is provided by a memory mapped file that allows its contents to be shared by multiple virtual machines. Any tool used to determine memory usage needs to be able to distinguish between shared memory (which can be accessed and shared by multiple JVMs) and private memory (which can be accessed only by a single JVM).

Virtual Address Dump Utility (Windows)

The Virtual Address Dump utility or vadump is part of the Microsoft® resource kit and can be used to provide information on the memory usage of an application, or in this case the shared classes cache. Vadump produces a lot of information, but we're just going to look at the reported figure for the working set size to provide us with a view of the memory usage of an application. The command vadump -os -p <pid> shows the working set for a given process ID.

The summary information produced contains a lot of information about the memory used by a process. To show the memory improvements achieved using shared classes, we focused on the Grand Total Working Set figure and how the Private, Shareable, and Shared contributions to this overall figure are affected by class data sharing. Listing 9 shows an example of the summary output from vadump. Shared classes are implemented as memory mapped files, so the memory they occupy is shown in the Mapped Data output row.

Listing 9. Example vadump output
vadump -os -p 5364
Category                        Total        Private Shareable    Shared
                           Pages    KBytes    KBytes    KBytes    KBytes
      Page Table Pages        29       116       116         0         0
      Other System             8        32        32         0         0
      Code/StaticData       2079      8316      5328       140      2848
      Heap                    87       348       348         0         0
      Stack                    4        16        16         0         0
      Teb                      1         4         4         0         0
      Mapped Data             95       380         0        24     356
      Other Data              61       244       240         4         0

      Total Modules         2079      8316      5328       140      2848
      Total Dynamic Data     248       992       608        28       356
      Total System            37       148       148         0         0
Grand Total Working Set     2364      9456      6084       168      3204

To determine the process ID needed in the vadump command you can use the Windows Task Manager:

  1. Open the Task Manager application and select the Processes tab.
  2. Locate the column called PID displayed. (If this column is not shown, click View > Select Columns and select the PID check box, as shown in Figure 2.)
  3. Locate the name of the process that you want to profile and note the value in the PID column. This is the process ID that you need to pass to vadump.
Figure 2. Enabling process ID information in Task Manager
Showing process ID information

Using top to gauge memory usage on Linux

Many tools exist for Linux users wanting to check memory usage. The top command suits our purposes for showing the effect of shared classes. In order to make the output easier to read we will supply the process ID on the command line and run it in batch mode. Listing 10 shows the command line and an example of the data received:

Listing 10. Top command line and sample output
top -b -n 1 -p <pid>
	
top - 13:33:41 up 18 days,  9:30,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   8157972k total,   311312k used,  7846660k free,    56448k buffers
Swap:  2104472k total,        0k used,  2104472k free,   141956k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7073 root      15   0 41616  13m 2228 S  0.0  0.2   5:43.70 X

The following values are of the most interest to us:

  • VIRT — Virtual Image (KB): The total amount of virtual memory used by the task. It includes all code, data, and shared libraries plus pages that have been swapped out.
  • RES — Resident size (KB): The non-swapped physical memory a task has used.
  • SHR — Shared Mem size (KB): The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes.

Configuring Eclipse to use the shared classes feature

To illustrate the footprint and startup improvements that can be realised using shared classes, we are going to measure the impact on two real world applications, Eclipse, which is representative of a client-side, desktop application, and Apache Tomcat, which is representative of a server-side application.

As we noted at the beginning of the article, there is not a stand-alone installation of the IBM SDK for Java 6 and runtime for the Java platform for Windows at this time. If you're using Windows (as opposed to Linux or AIX) you will need to download the prebundled package for Eclipse.

If you're using Linux or AIX, download the stand-alone IBM SDK for Java 6 implementation and then download the version of Eclipse you require from the Eclipse project site (see Resources). Follow the Eclipse installation instructions to have Eclipse use the IBM SDK for Java 6.

After you install Eclipse, you need to perform the following additional steps:

  1. Enable class sharing for plug-ins. Install the OSGI plug-in adapter (see Resources) into the Eclipse plug-ins directory.
  2. Download SampleView.jar (see Download) and install it into the Eclipse plug-ins directory. This plug-in makes it easier for you to time the startup time of Eclipse by hooking into the IBM JVM trace and then outputting some trace points when the view is initialised. We cover the use of the IBM JVM trace to provide startup statistics in the next section.
  3. Create two workspaces called workspace1 and workspace2. This allows you to launch two instances of Eclipse, pointing at different workspaces but which are sharing the same class cache.

You'll also need to set up Tomcat if you haven't already. Simply download the application from the Apache Tomcat Web site, unzip the package, and follow the instructions contained in the file running.txt.

Performance comparisons

Using the tools and applications described in the previous sections, we will measure the performance gains provided by shared classes. Where possible, we try and isolate the shared classes feature (by disabling other features where possible) so that it is easier to interpret the results.

Eclipse performance: Footprint

To check the footprint, we ran multiple Eclipse instances concurrently on the same Windows system by using different workspaces. We collected vadump data for comparison using Eclipse started three different modes:

  • Eclipse is started as normal without any shared classes functionality enabled.
  • Eclipse is started for the first time with a cleared shared class cache.
  • A second Eclipse instance is started using the same shared class cache as above.

To enable shared classes with Eclipse, we need to create a new startup command line that contains the correct JVM options. Rather than just create a new shortcut, Listing 11 shows a batch file that is used to start Eclipse. It performs the following functions:

  • Takes a single command-line parameter of either 1 or 2, which corresponds to the workspaces created when configuring Eclipse.
  • Clears any shared class caches that already exist if workspace 1 is specified.
  • After Eclipse terminates, the statistics for the cache are printed.
Listing 11. Batch file used to start Eclipse
@echo off
rem batch file to start Eclipse using the specified workspace
SET ECLIPSE_HOME=C:\java\eclipse\IBMEclipse\eclipse
SET JVM=C:\java\eclipse\IBMEclipse\ibm_sdk50\jre\bin\java.exe
SET WNAME=C:\java\eclipse\workspace%1
SET SC_OPTS=-Xshareclasses:name=eclipse,verbose 
SET VMARGS=%SC_OPTS%

echo Clearing shared classes cache
if %1==1 %JVM% -Xshareclasses:destroyAll

echo JVM version
%JVM% -version

echo Starting Eclipse
%ECLIPSE_HOME%\eclipse.exe -nosplash -data %WNAME% -vm %JVM% -vmargs %VMARGS%
%JVM% -Xshareclasses:name=eclipse,printStats

Listing 12 shows the vadump report for an Eclipse instance that doesn't share classes. The fields in the vadump output that we are most interested in are Shareable KBytes, Shared KBytes, and Grand Total Working Set: KBytes.

Listing 12. vadump output for Eclipse without shared classes
Category                        Total        Private Shareable    Shared
                           Pages    KBytes    KBytes    KBytes    KBytes
      Page Table Pages        54       216       216         0         0
      Other System            28       112       112         0         0
      Code/StaticData       4199     16796     11500      1052      4244
      Heap                  9400     37600     37600         0         0
      Stack                   98       392       392         0         0
      Teb                     21        84        84         0         0
      Mapped Data            130       520         0        36       484
      Other Data            5337     21348     21344         4         0

      Total Modules         4199     16796     11500      1052      4244
      Total Dynamic Data   14986     59944     59420        40       484
      Total System            82       328       328         0         0
Grand Total Working Set    19267     77068     71248      1092      4728

Listing 13 shows the output of vadump when using the batch file in Listing 11 to start Eclipse, and you can see that approximately 4MB of classes (reported as 4116 KBytes of Shareable Mapped Data) have been placed in the cache, which has increased the overall working set size by a corresponding amount. The highlighted entry shows that the memory has been marked as able to be shared by other processes. One thing to keep in mind when comparing outputs from vadump is that although they are taken when Eclipse has started, there will be some small variations in the figures reported.

Listing 13. vadump output and statistics for first time Eclipse is started using a shared class cache
Category                        Total        Private Shareable    Shared
                           Pages    KBytes    KBytes    KBytes    KBytes
      Page Table Pages        54       216       216         0         0
      Other System            28       112       112         0         0
      Code/StaticData       4256     17024     11676      1072      4276
      Heap                  8631     34524     34524         0         0
      Stack                  103       412       412         0         0
      Teb                     20        80        80         0         0
      Mapped Data           1155      4620         0      4116       504
      Other Data            5386     21544     21540         4         0

      Total Modules         4256     17024     11676      1072      4276
      Total Dynamic Data   15295     61180     56556      4120       504
      Total System            82       328       328         0         0
Grand Total Working Set    19633     78532     68560      5192      4780

Current statistics for cache "eclipse":


base address       = 0x42B0E000
end address        = 0x43B00000
allocation pointer = 0x42E0B958

cache size         = 16776844
free bytes         = 12005976
ROMClass bytes     = 4001256
AOT bytes          = 625428
Data bytes         = 57043
Metadata bytes     = 87141
Metadata % used    = 1%

# ROMClasses       = 1334
# AOT Methods      = 480
# Classpaths       = 4
# URLs             = 0
# Tokens           = 0
# Stale classes    = 0
% Stale classes    = 0%

Starting another instance of Eclipse and then running vadump on that instance, as shown in Listing 14, initially seems to show very little difference in memory usage. But upon closer inspection, you can see that 4MB of memory (reported as 4564 KBytes of Shared Mapped Data) is actually being shared with another process. Vadump (and Task Manager) count shared memory in the Grand Total Working Set for every process that uses it. So the second Eclipse instance has a 4MB lower footprint because it is sharing the class cache created and populated by the first Eclipse instance.

The results shown are for the startup of Eclipse with a minimal number of plug-ins installed. You can expect more classes to be placed in the shared class cache and a corresponding improvement in startup time if you have more plug-ins installed.

Listing 14. vadump output for the second Eclipse is started using an existing shared class cache
 Category Total Private
Shareable    Shared
                           Pages    KBytes    KBytes    KBytes    KBytes
      Page Table Pages        54       216       216         0         0
      Other System            29       116       116         0         0
      Code/StaticData       4254     17016     11676         0      5340
      Heap                  8684     34736     34736         0         0
      Stack                   98       392       392         0         0
      Teb                     20        80        80         0         0
      Mapped Data           1150      4600         0        36      4564
      Other Data            5261     21044     21040         4         0

      Total Modules         4254     17016     11676         0      5340
      Total Dynamic Data   15213     60852     56248        40      4564
      Total System            83       332       332         0         0
Grand Total Working Set    19550     78200     68256        40      9904

Eclipse performance: Startup

As well as a footprint improvement, sharing classes also decreases the startup time as classes are loaded from the cache rather than from disk. In addition, using AOT code from the cache helps reduce startup time. To time the startup of Eclipse, we use a custom view (described earlier in the article) that utilises the IBM JVM trace to write out messages when it loads. We also must modify the Eclipse startup batch file from Listing 11 to enable the JVM trace and to record the following trace events:

  • The initialisation of tracing: Tracing starts almost immediately after the JVM is launched but before any classes are loaded. We use this as the starting point for comparing startup times.
  • The sample view messages: The first message is written out when the view is initialised and is taken to indicate that Eclipse has started. We use this as the end point for comparing startup times.

Listing 15 shows the modified batch file, with the additional JVM trace configuration lines shown in bold:

Listing 15. Batch file to start Eclipse with tracing enabled
@echo off
rem batch file to time Eclipse startup
SET ECLIPSE_HOME=C:\java\eclipse\IBMEclipse\eclipse
SET WNAME=C:\java\eclipse\workspace%1
SET JVM=C:\java\eclipse\IBMEclipse\ibm_sdk60\jre\bin\java.exe
SET TRACE_OPTS=-Xtrace:iprint=tpnid{j9trc.0},iprint=SampleView
SET SC_OPTS=-Xshareclasses:name=eclipse,verbose 
SET VMARGS=%SC_OPTS% %TRACE_OPTS%

echo Clearing shared classes cache
if %1==1 %JVM% -Xshareclasses:destroyAll

echo JVM version
%JVM% -version

echo VM arguments
echo %VMARGS%

echo Starting Eclipse
%ECLIPSE_HOME%\eclipse.exe -nosplash -data %WNAME% -vm %JVM% -vmargs %VMARGS% 

%JVM% -Xshareclasses:name=eclipse,printStats

Listings 16 and 17 show the output when starting Eclipse without shared classes, and then with them enabled. As you can see, there is nearly a 1-second improvement in the startup time, which represents a 25 percent reduction in startup time. The timings shown when running with shared classes is for a second start of Eclipse as the first one will be used to populate the cache. With a bare bones version of Eclipse, only 4MB of data is stored in the cache, so there is plenty of room for larger and more complex Eclipse-based applications to take advantage of class sharing to decrease their startup times.

Listing 16. Eclipse startup, no shared classes
09:47:55.296*0x41471300   j9trc.0         - Trace initialized for VM = 00096238
09:47:59.500 0x41471300SampleView.2         - Event id 1, text = Mark
09:47:59.500 0x41471300SampleView.0         > Entering getElements(Object parent)
09:47:59.500 0x41471300SampleView.1         < Exiting getElements(Object parent)

Startup = 4.204 seconds
Listing 17. Eclipse startup, shared classes enabled
09:30:40.171*0x41471300   j9trc.0         - Trace initialized for VM = 000962A8
[-Xshareclasses verbose output enabled] 
JVMSHRC158I Successfully created shared class cache "eclipse" 
JVMSHRC166I Attached to cache "eclipse", size=16777176 bytes
09:30:43.484 0x41471300SampleView.2         - Event id 1, text = Mark
09:30:43.484 0x41471300SampleView.0         > Entering getElements(Object
parent) 09:30:43.484 0x41471300SampleView.1         < Exiting
getElements(Object parent)

Startup = 3.313 seconds

Tomcat performance: Footprint

So far we have seen how shared classes provide startup and footprint improvements in the client-side environment. These benefits can also be realised in the server-side environment. As noted earlier, we use Tomcat as our example application. Tomcat does not require any special steps in order to use the IBM JVM. The only step needed to take advantage of shared classes is to set appropriate values for the JVM_OPTS environment variable, shown in Listing 18, which Tomcat uses to start its JVM with specific command-line options:

Listing 18. Setting JVM options for Tomcat
export JAVA_OPTS="-Xmx32m -Xms32m -Xshareclasses:name=tomcat,verbose"

To demonstrate the effect of shared classes across different platforms, we used the Linux versions of the IBM JVM and Tomcat.

As discussed earlier, the top command is a good tool for measuring the Tomcat footprint on Linux. For this example, we ran top when we started Tomcat without shared classes enabled (by removing "-Xshareclasses:name=tomcat,verbose" from the JVM_OPTS environment variable) and again with shared classes enabled. Then, we started a second instance of Tomcat to demonstrate the difference in memory usage reported for the 2 Tomcat processes sharing the same class cache. Listings 19, 20, and 21 show the top output in each case. Listing 22 shows shared classes cache statistics where appropriate.

Listing 19. Tomcat footprint without shared classes
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1% us,  0.0% sy,  0.0% ni, 99.9% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   8157972k total,  1727072k used,  6430900k free,   101152k buffers
Swap:  2104472k total,        0k used,  2104472k free,  1370944k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24595 jbench    25   0 66744  54m 8400 S  0.0  0.7   0:03.71 java
Listing 20. Tomcat footprint with shared classes
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 99.9% id,  0.1% wa,  0.0% hi,  0.0% si
Mem:   8157972k total,  1728800k used,  6429172k free,   101152k buffers
Swap:  2104472k total,        0k used,  2104472k free,  1376084k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24621 jbench    17   0 78440  56m  14m S  0.0  0.7   0:04.04 java
Listing 21. Footprint for 2 instances of Tomcat sharing the same class cache
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   8157972k total,  1766440k used,  6391532k free,   101152k buffers
Swap:  2104472k total,        0k used,  2104472k free,  1376084k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24621 jbench    17   0 78440  56m  14m S  0.0  0.7   0:04.08 java 
24674 jbench    16   0 77600  51m  14m S  0.0  0.6   0:02.28 java
Listing 22. Current statistics for cache used by Tomcat
base address       = 0x76D0E000
end address        = 0x77D00000
allocation pointer = 0x77186268

cache size         = 16776852
free bytes         = 10085680
ROMClass bytes     = 5911028
AOT bytes          = 621280
Data bytes         = 57051
Metadata bytes     = 101813
Metadata % used    = 1%

# ROMClasses       = 1634
# AOT Methods      = 452
# Classpaths       = 6
# URLs             = 0
# Tokens           = 0
# Stale classes    = 0
% Stale classes    = 0%

Cache is 39% full

When you first look at these results comparing Tomcat footprints without and with shared classes, it's hard to see the benefits of running shared classes as the memory usage figures have increased. However, if we dig a little deeper into these figures, the picture begins to change:

  • SHR increased by about 6MB, from 8400KB to 14MB. This is the size of the data stored in the shared classes cache.
  • RES increased slightly from 54MB to 56MB due to the infrastructure required to support shared classes (object libraries etc.).
  • VIRT increased because it is the combined values of the increases in SHR and RES.

When we launch a second instance of Tomcat and use top to examine the memory usage, we can see the second instance (process 24674 in Listing 21) shows the same amount of shared memory (reported as 14MB SHR), but there's a 5MB reduction in the RES size (from 56MB to 51MB) and as well as a reduction in virtual memory too. As with vadump on Windows, top will correctly identify memory that has the potential to be shared, but does not display other processes that are actually hooked into the shared memory. In this case, both Tomcat instances are using the same shared classes cache and so their overall footprint is reduced. The Tomcat servers in this test use less than half of the available cache. Listing 22 shows 5911028 bytes of shareable ROMClass data was placed in the cache (a little under 6MB), which means that there is more scope to reduce footprint by sharing classes in the cache.

Tomcat performance: Startup

Enabling shared classes also provides a decrease in the Tomcat startup time. To measure startup time, we use the timings that are written into the log file catalina.out, which is located in <TOMCAT_HOME>/logs. To provide a baseline for further comparisons, Tomcat is started without shared classes enabled. Listing 23 shows the reported Tomcat startup time (the other log lines written out during the startup process have been omitted for clarity):

Listing 23. Tomcat startup times with shared classes disabled
24-Apr-2008 13:01:08 org.apache.catalina.startup.Catalina 
start INFO: Server startup in 1138 ms

These timings are then compared to those reported with shared classes enabled, which are shown in Listing 24:

Listing 24. Tomcat startup times with shared classes enabled with AOT code
24-Apr-2008 13:06:57 org.apache.catalina.startup.Catalina 
start INFO: Server startup in 851 ms

As you can see, shared classes decrease the Tomcat startup time from 1138ms to 851ms, which represents a 25 percent reduction in startup time. These improvements come from a mixture of class sharing and AOT code usage. To see how much benefit is derived from AOT code you can disable its use with the -Xnoaot command-line option, as shown in Listing 25, and note the startup time:

Listing 25. Increased startup time without AOT
24-Apr-2008 13:03:50 org.apache.catalina.startup.Catalina 
start INFO: Server startup in 950 ms

As you can see, the increase shown in Listing 25 demonstrates that the capability to store AOT code in the shared classes cache provides valuable benefits with respect to decreasing the Tomcat startup time.

Summary

This article has demonstrated how shared classes can improve both the startup times of your Java applications and decrease the amount of memory used. Specifically, we've shown you how to quantify the footprint and startup benefits of the shared classes support using Tomcat and Eclipse. Clearly, not all applications behave in the same way, and thus won't receive the same benefits. But even the simple configurations we used have shown a significant reduction in startup times.

Keep in mind that you'll gain the most benefit when you have several applications running the same level of IBM SDK, as they have the most to share and have the most to gain. But even a single application will derive some benefit in startup time if it's using a shared class cache.

Also, we've shown you how to allow for the double counting of shared memory by tools like vadump for Windows and top for Linux to allow you to more accurately measure the memory savings provided by class sharing. While these tools don't give a perfect view of memory usage, we've shown you how to read between the lines.


Download

DescriptionNameSize
Source code samplej-sharedclasses.jar6KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Java technology on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=342598
ArticleTitle=Enhance performance with class sharing
publish-date=09302008