This article describes a series of performance benchmarks that demonstrate the scaling properties of a PowerVM virtual machine (VM) running WebSphere Portal on AIX. It briefly describes PowerVM CPU entitlements and how virtual CPUs are allocated. Then, this article documents the WebSphere Portal benchmark environment. Finally, the results show how WebSphere Portal scales when either the CPU entitlement changes or the number of virtual CPUs (vCPUs) changes. Uncapped compared to capped configurations are also measured along with some simple startup time measurements for various configurations.
See the following paragraphs for the main PowerVM concepts.
One of the unique PowerVM unique features is the ability to give a VM an exact slice of CPU time. In PowerVM, this slice is called a CPU entitlement. Implicit in this concept is that an entitlement can be for less than one CPU. A VM with fractional entitlements is termed a micropartition. One question this article will answer is if micropartitions scale linearly; that is, is there any overhead running with fractional CPUs?
In addition to specifying the CPU entitlement, a VM must also specify the number of vCPUs it will use. These vCPUs are what the VM operating system sees, regardless of the underlying CPU entitlement.
A VM can have up to ten times the number of vCPUs as its entitlement, rounded down. vCPUs must be specified as whole numbers. For example, a VM with 1 entitlement can have 10 vCPUs; a VM with a 0.25 entitlement can only have 2 vCPUs. In addition, a VM must have at least 1 vCPU for each whole-number entitlement. It is not possible to have 2 entitlements and only 1 vCPU for example.
The interesting question here is whether a multithreaded workload benefits from having PowerVM or the operating system manage threads. PowerVM manages a multithreaded workload by scheduling additional vCPUs for a VM. The AIX operating system handles multiple threads using context switching between its available CPU threads.
Capped compared to uncapped
Another feature of PowerVM is that it allows VMs to use more CPU processing time if needed. If one VM is not using its CPU for a given scheduling interval, other VMs are able to use this CPU time to do more work. In order to use additional processor cycles, the VM must be in uncapped mode, which means that it can use up to its vCPUs worth of entitlement. For example, a VM with a 0.25 entitlement and 2 vCPUs in uncapped mode could use up to 2 CPU entitlements. If the VM only had 1 vCPU, it could only use 1 CPU entitlement. In capped mode, the same VM would never use more than 0.25 entitlement, regardless of its vCPU count. A separate set of benchmarks was run to measure how VMs with 2 vCPUs scale with different entitlements and uncapped settings.
The extra CPUs used by uncapped virtual machines can come from either other shared CPU VMs or from dedicated CPUs VMs that allow processor sharing. For dedicated CPU VMs, this is called donating mode.
Configuration in the HMC
All the above settings are managed in the Hardware Management Console (HMC) under the Manage Profiles item for the VM. After opening up a system profile, all CPU-related settings are in the Processors tab.
In dedicated mode, the VM cannot have fractional entitlements. The number of vCPUs will always be the same as the entitlement.
In Figure 1, the VM is using dedicated CPUs with no donating. It has an entitlement of 2 and will present 2 CPUs to AIX.
Figure 1. HMC configuration for a dedicated CPU VM
In shared mode, both the CPU entitlement and the number of vCPUs must be specified. If the system will be uncapped, that value should be checked too.
In Figure 2, the VM is using shared CPUs. It is set with an entitlement of 0.5, has 2 vCPUs and runs in capped mode.
Figure 2. HMC configuration for a shared CPU VM
WebSphere Portal benchmark
To answer the questions posed above, a set of measurements will be run on WebSphere Portal 126.96.36.199.
For this benchmark, a simple set of test portlets was installed. These portlets make no requests to external databases, files or other systems. Tuning was applied as specified in the Portal 8.0 Tuning Guide to ensure that database access by WebSphere Portal is minimal after warmup. This means that the benchmark will only be limited by the CPU resources of the WebSphere Portal server.
To simulate a realistic customer application, the test portlets were installed on a set of several hundred portal pages in a 2-level hierarchy. Each user can see 26 pages. Users are stored in a separate Lightweight Directory Access Protocol (LDAP) server that is not running on the system under test.
The system under test is an IBM Power Systems™ 730 model with 16 CPUs running at 3.55 GHz. This system has 128 GB of memory. This system runs the following VMs:
- WebSphere Portal server
- Various CPU configurations
- 16 GB of memory
- WebSphere Portal 188.8.131.52 and IBM HTTP Server
- AIX 7.1 TL1 SP5
- WebSphere Portal database server
- 1 dedicated CPU
- 8 GB memory
- IBM DB2® 9.7
- AIX 7.1 TL1 SP5
- VIO server
- 1 dedicated CPU
- 4 GB memory
- VIOS 184.108.40.206
Both the WebSphere Portal server and the DB2 server are connected to the same virtual network, run by the Virtual I/O Server (VIOS). Traffic between Portal and the database never leaves the system. The VIOS also exposes a storage area network (SAN) logical unit (LUN) to each VM as a vSCSI disk.
The WebSphere Portal server also runs IBM HTTP Server. This server is configured to cache static content in a memory cache. At steady state, the majority of the requests being made to WebSphere Portal are for dynamic pages, not static content in portlet WAR files.
For all measurements, the same WebSphere Portal VM was used. The configuration was changed by altering the profile in the HMC as detailed above. This was done to ensure that the system being tested was otherwise identical. Note that the tuning done on this system was done to achieve maximum throughput on the largest configuration for this system. Some settings might not be ideal for smaller VM configurations, but this was deemed acceptable given the goals of these benchmarks.
The above configuration is used to run a measurement workload with IBM Rational® Performance Tester. This workload consists of five separate paths through the site. One path is unauthenticated and only visits public pages. In paths two and three, the users log in and access either pages. All users can see pages specific to the groups the user is a member of. In the 4th and 5th paths, users access pages that contain "application" portlets that drive more CPU utilization by simulating a shopping cart application and other business processes. These paths are differentiated like paths two and three; one contains applications that all users can see the other only applications applicable to the user's groups.
In Rational Performance Tester, this workload is ramped up to increasing transaction levels by increasing the number of virtual users. Each level is run for 15 minutes to ensure sufficient transactions are measured. The measurement is stopped when the response times for any transaction exceed 1 second, on average for a given throughput level. Peak throughput is recorded as the level before the maximum; that is, the last point when all response times were below thresholds.
The Rational Performance Tester agents were all run on IBM System x® servers running Red Hat Enterprise Linux® 6.3. The agent systems are connected to the system under test by a dedicated gigabit network used only for benchmark traffic.
The first set of measurements was done on various capped configurations. This means that the CPU resources given to the VM are static for the duration of each benchmark and each measurement can be compared to the others for scaling purposes.
The configurations tested varied the entitlement from 0.25 to 3. Separately, the number of vCPUs was also varied between 1 and 3, where possible. This generates the following test matrix:
Table 1. Test matrix for capped CPU configurations
Again, it is not possible to give more than a 1-to-1 entitlement-to-vCPU ratio, but vCPU-to-entitlement can be 10 to 1 (rounded down).
Running these tests gives the following results:
Figure 3. Capped CPU results
First, looking across the results as a whole, WebSphere Portal scales nearly linearly with respect to the entitlement. This indicates that there is minimal overhead when PowerVM is scheduling fractional CPU resources on micropartitions.
Within each entitlement level however, there is a penalty for allocating more vCPUs. PowerVM assigns each vCPU to a separate real CPU, so the performance degradation when moving from 1 to 2 vCPUs might be due to L2 cache misses as threads move between multiple real CPUs.
However, it is not necessarily better to run with a limited number of vCPUs because it means that the potential for uncapping is limited.
A second set of measurements was done using 2-CPU VMs. The baseline measurement is a 2 entitlement, dedicated configuration. This was compared to configurations that varied the entitlement while keeping the number of vCPUs fixed to 2.
Because all the configurations run with 2 vCPUs, they can all potentially use 2 real CPUs worth of resources. Thus, they should all have the same throughput.
Running these tests gives the following results:
Figure 4. Uncapped CPU results
First, note that a dedicated CPU VM and a shared CPU VM have the same performance. This indicates that the PowerVM scheduling overhead difference is minimal when moving to a shared CPU virtual machine.
Similarly, a VM with 1.5 entitlements and uncapped performs as expected. Because there is no other significant processing on the system and a large number of free CPUs, PowerVM can easily give this configuration 2 CPUs worth of processing at peak load.
On the other hand, the 1 entitlement configuration didn't perform as expected. It appears that PowerVM does not give this configuration 2 CPUs even when it could use the resources. This might be due to the fact that when uncapped, the entitlement is effectively doubled. More investigation is needed here but it is recommended that uncapped configurations that greatly increase the effective entitlement are tested prior to deployment.
WebSphere Portal startup
Another interesting item to measure it the time it takes WebSphere Portal to start. This is not a significant issue in production systems where WebSphere Portal is expected to be started only occasionally. However, on development or test systems, which frequently have low entitlements, it is useful to know if WebSphere Portal startup takes significantly longer with less CPU resources.
The time to start WebSphere Portal was measured using the UNIX 'time' command. Startup directly after a VM restart was measured – this is a "cold' startup where there is nothing in the processor caches or the operating system's file caches. After measuring cold startup, WebSphere Portal was stopped then started again. This second time is the "warm" startup time, which should be faster given that the operating system's caches are warmed up.
Running these tests gives the following results:
Figure 5. WebSphere Portal startup time
As expected, WebSphere Portal startup time scales linearly with the CPU resources it is given. Warm startup times are faster than cold startup times, also as expected.
These results indicate that systems that need little total resources on average would still benefit from more resources when WebSphere Portal is starting up. For development and test machines, uncapped allocations are ideal as it means that many of them can be fit in a small processor pool but still get 1 or more CPUs when needed to start WebSphere Portal quickly.
The capacity of WebSphere Portal running on PowerVM scales linearly with the CPU entitlement. Adding more virtual CPUs than required for the entitlement however, has a performance penalty that should be considered if it is not needed.
One consideration is if a VM occasionally needs more than its given CPU entitlement. In that case, adding additional vCPUs and setting the configuration to uncapped will allow the VM to use more resources when needed. Note that this particular benchmark shows that uncapped modes might scale better when the uncapped maximum is not significantly larger than the base entitlement.
Uncapped allocations also benefit WebSphere Portal startup times as they allow smaller VMs to access the higher CPU resources needed to start WebSphere Portal as quickly as possible.
Finally, note that these results presented here are the ideal case because the system tested has many free CPUs and there is no competition for resources. The overhead of switching workloads between shared processors will be greater when the whole system is in use. PowerVM overhead might be higher as well when many VMs are requesting extra CPUs cycles from other idle VMs.
Dig deeper into AIX and Unix on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.