CPU scaling study

This study shows how the workload scales when workload submission rates are increased while available dedicated Central Processing Units (CPUs) are scaled.

Introduction to CPU Scaling

CPU scaling is a measure of how much workload can be driven when the CPU resources are increased. An increase of workload can occur when the number of total transactions or the transaction rate are increased. For this workload, the workload submission rate (the rate at which work is submitted to the J2EE middleware layer), has to be increased. However, increased workload in this study also requires a larger database, which means that not only the workload must be scaled, but the whole environment. Scaling the whole environment might have other effects on the performance than just doing more work with the same data.

Maximizing CPU utilization

To determine the performance characteristics of the workload, measurements are taken using one, two, four, and eight dedicated CPUs on the WebSphere® system. A workload entry rate is chosen that is high enough to drive the CPUs to near full utilization. The results can be used to gain a better understanding of the scalability of the workload, and can be used as a way to measure differences in the performance of the same workload on 64-bit WebSphere versus 31-bit WebSphere.

In all 64-bit WebSphere measurements, the heap settings for the JVM are set to 75% of the 8 GB available memory. This is the optimum percentage derived from the study Heapsize for the 64-bit Java Virtual Machine. That worked out to a 64-bit WebSphere JVM heap settings of -Xms6144m -Xmx6144m. A memory size of 8 GB is also configured for the DB2® LPAR, which runs with four configured CPUs for all of the tests.

10 Gb Ethernet chosen for highest workload

The workload submission rate of 600 was found to exceed the capacity of the 1 Gb Ethernet network. This causes network saturation, dampening throughput and providing additional work for error handling. The 600 workload submission rate tests are therefore run using a 10 Gb Ethernet, to remove the effects of a network bottleneck on the results. A submission rate higher than 600 would have required a larger restructuring of the environment, because of the higher resource usage from the clients to WebSphere and up to the database. This would have exceeded the scope of the study.

CPU Scaling

Dedicated CPUs are assigned to the WebSphere System being tested. The experiments use one, two, four, or eight dedicated CPUs. The workload is then varied until a CPU utilization close to or greater than 90% is observed. The workload is adjusted by changing the workload submission rate. When eight CPUs are dedicated to the WebSphere image, only approximately 80% total CPU utilization at a workload submission rate of 600 was observed. This is because, as explained in 10 Gb Ethernet chosen for highest workload, different WebSphere or client tuning values would have been needed for a submission rate greater than 600.

Transaction scaling and response time measurements

The transaction rate is the throughput as reported by the client-side summary reports. Response time measured is the observed response time of a simulated Web operation (such as an online Web purchase or a Web browse operation) and the turnaround of the Web request from the WebSphere system after the completion of some business logic. These response times are averaged with response times for manufacturing operations. The performance of the DB2 subsystem is also represented in this data. Table 1 summarizes these results in tabular format. Figure 1, Figure 2, and Figure 3 are graphical representations of the results.

With this workload, the throughput measurements stay fairly constant and performance degradations are first indicated by increasing response times and CPU utilization.

Table 1. CPU scaling study: WebSphere V6.1 CPU Scaling results for one, two, four, and eight CPUs at high CPU utilization
31-bit or 64-bit JVM Workload submission rate Workload throughput Number of CPUs CPU utilization Response time (ms)
31-bit 110 101% 1 87% 432
64-bit 110 100% 1 97% 793
31-bit 190 174% 2 175% 385
64-bit 190 174% 2 189% 654
31-bit 350 322% 4 353% 361
64-bit 350 321% 4 371% 404
31-bit 600 551% 8 612% 562
64-bit 600 550% 8 641% 518
Figure 1. CPU scaling study: Workload transaction rates with 31-bit and 64-bit WebSphere
Graph of transactions rates
Figure 2. CPU scaling study: CPU utilization with 31-bit and 64-bit WebSphere
Graph of CPU scaling
Figure 3. CPU scaling study: Response times with 31-bit and 64-bit WebSphere
Graph of response times

Observations

The workload scales very linearly for both 31-bit and 64-bit WebSphere Application Servers. The 31-bit version requires a little less CPU at higher workloads than the 64-bit version. The utilization of the CPUs also scales very linearly for both 31-bit and 64-bit WebSphere.

An unexpected behavior is shown by the response time. The response time becomes shorter with the higher workloads when using a larger number of CPUs, and increases again on the last scaling step with the highest workload. Here, the 31-bit WebSphere Application Server differs significantly from the 64-bit WebSphere Application Server; the response time with one CPU is much shorter, but the gap decreases with the scaling. At a submission rate of 600 with eight CPUs, the 64-bit WebSphere Application Server's response time becomes shorter.

Conclusions

The very good linear scaling in throughput and CPU utilization makes scaling of this workload easy for a system administrator. The difference between the 31-bit WebSphere and the 64-bit WebSphere is small. The more efficient garbage collection of the 64-bit version seems to compensate for the drawback of the larger memory addresses, as seen in other studies. See https://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/perf/ZSW03030-USEN-00.pdf.

At higher workload submission rates, the advantages of a larger heap on 64-bit WebSphere result in an improving response time curve. A more detailed analysis of garbage collection can be found in Comparing 64-bit WebSphere versus 31-bit WebSphere.

A CPU utilization greater than 90% was not observed for the higher workloads, indicating an unidentified bottleneck, which might be the HiperSockets connection between the WebSphere Application Server and the database. Additional investigation would be required to determine the cause of this bottleneck. The high CPU utilization of 97% of the one CPU run with the 64-bit WebSphere becomes critical for a system running with HiperSockets, and is very likely the reason for the high response times there.

Large heaps provide more space for both long-lived and newly-created objects. It seems that the Generational Concurrent garbage collection option works very efficiently, even for this workload, which was designed to have a high load and resource utilization on the WebSphere Application Server.