Repeatable measurements

It is important to understand that unless you dedicate hardware for a benchmark, the CPU that is used can vary each time that the benchmark is run. Achieving repeatable results can be difficult. This is true for benchmark comparisons and also for CPU usage comparisons after a CICS upgrade.

For more information about how CPU time can be affected by other address spaces in the LPAR and other LPARs on the central processor complex (CPC), see the IBM CICS Performance Series: Effective Monitoring for CICS Performance Benchmarks, REDP-5170.

The LPARs that support the CICS regions in all performance benchmarks that are described in this documentation include dedicated CPs. Although the CPs are dedicated, the L3 and L4 caches remain shared with other CPs that are used by other LPARs. So, this situation is not perfect; it can lead to CPU variation because those caches can have their data invalidated by those CPs that are used by the other LPARs. Minimizing the magnitude of these external influences is a high priority in producing reliable performance benchmark results.

An automated measurement system is used to run the benchmarks and collect the performance data. This automated system runs overnight during a period when no human users are allowed to access the LPAR. The use of an automation system reduces variation in results by ending unnecessary address spaces that can potentially disrupt the measurements. The use of overnight automation also minimizes disruption because that is the timeframe during which other LPARs on the CPC are least busy.

Repeatability for Java workloads

Java programs consist of classes, which contain Java™ bytecode that is platform-neutral, meaning that it is not specific to any hardware or operating system. At run time, the Java virtual machine (JVM) compiles Java bytecode into IBM z/Architecture instructions by using the just-in-time compiler (JIT) component.

Producing highly optimized z/Architecture instructions from Java bytecode requires processor time and memory. If all Java methods were compiled to the most aggressive level of optimization on first execution, this process results in long application initialization times and wastes significant quantities of CPU time optimizing methods that are used only during startup.

To provide a balance between application startup times and long-term performance, the JIT compiler optimizes the bytecode by using an iterative process. The JIT compiler maintains a count of the number of times that each Java method is called. When the call count of a method exceeds a JIT recompilation threshold, the JIT recompiles the method to a more aggressive level of optimization and resets the method invocation count. This process is repeated until the maximum optimization level is reached. Therefore, often-used methods are compiled soon after the JVM started, and less-used methods are compiled much later, or not at all. The JIT compilation threshold helps the JVM to start quickly and still have good long-term performance.

For more information about the operation of the JIT compiler on z/OS, see "Enabling and disabling the JIT compiler" in the IBM SDK, Java Technology Edition documentation.

This process of progressively optimizing Java methods leads to a change over time in the amount of CPU that is consumed by otherwise identical transactions. The first time that a transaction runs in Java, the z/Architecture instructions that are produced by the JIT compiler are at a low optimization level. This low optimization results in a relatively high CPU cost to execute the Java methods.

As more transactions run, the Java method invocation counts increase. Therefore, the JIT recompiles a Java method to a more aggressive level of optimization. This greater level of optimization results in a Java method that requires less CPU to execute than before the recompilation took place. As a result, the CPU that is required to execute the transaction reduces. This process is repeated several times during the lifetime of the JVM.

Figure 1 illustrates this process for a complex servlet workload in the plot.

Figure 1. Plot of CPU cost per transaction over time for a Java workload
Graph that illustrates the CPU impact of Java complex server workload

Noting that the vertical axis in Figure 1 is a logarithmic scale, the first few invocations of the transaction show a relatively high CPU usage. As the transaction is executed multiple times, the JIT compiler optimizes the workload more aggressively. So, the CPU cost per transaction reduces over time. You can see steps in the CPU cost per transaction value. These steps are events where high-use methods are further optimized. The frequent spikes in the CPU cost per transaction are the result of garbage collection events.

When you run benchmarks that use JVMs, ensure that the JIT compiler has fully optimized the most important Java methods in the workload before you start CPU measurements. To minimize variability introduced by the JIT compiler, run the CICS Java workload at a constant transaction rate for a period of time, which is known as the warm-up time. After the workload is running in a steady-state for the warm-up period, it is assumed that the JIT compiler will not optimize the workload further, and you can take CPU measurements.

The warm-up period for a workload is determined by producing a chart, such as the one in Figure 1. The warm-up time is the point at which the CPU cost per transaction ceases to show any improvements.

Shutting down a JVM discards the JIT-compiled native code. Therefore, the iterative process of optimization begins again when the JVM is restarted. The ahead-of-time (AOT) compiler can persist generated native code across subsequent executions of the same program, with the primary goal of improving startup times. The AOT compiler generates native code dynamically while an application runs and caches any generated AOT code in the shared data cache. Subsequent JVMs that execute the method can load and use the AOT code from the shared data cache without incurring the performance decrease experienced with JIT-compiled native code.

Because AOT code must persist over different program executions, AOT-generated code does not perform as well as JIT-generated code. AOT code usually performs better than interpreted code. For more information about the AOT compiler, see "The AOT compiler" in the IBM SDK, Java Technology Edition documentation.