This two-part article discusses virtualization's pros and cons with concrete examples. In Part 1 we discussed virtualization at a high level especially as it relates to IBM Rational's products. We covered how four dimensions of virtualization, CPU, Memory, Disk I/O, Storage, and Network, must be properly managed with affinity (dedicated resources) and without overcommitment. We gave examples of how poorly managed virtualization can drastically affect the performance of IBM® Rational® products. Specifically, we showed two case studies in which IBM® Rational Team Concert™ and IBM® Rational® ClearCase® performance suffered when they were hosted in poorly configured virtualized environments with virtual machines (VMs) configured without affinity.
In Part 2 we look deeper at the tradeoffs of overcommitment. Drawing upon our experience testing Rational products and advising customers, we offer suggestions and tips, troubleshooting strategies and vendor-specific examples to help you better manage your virtualized infrastructure. Troubleshooting situations and suggestions appear in the jazz.net Deployment wiki
Case Study No. 3. Exploring overcommitment with ClearCase
Case Study 2 demonstrated how IBM Rational ClearCase performance suffered when hosted on VMs without dedicated resources. We suggest affinity and dedicated resources for IBM Rational products and to avoid overcommitment wherever possible. However, we recognize that managed overcommitment can be an essential value proposition of virtualization. Case Study 3 looks at different degrees of overcommitment.
For one of our tests, we took an Intel Westmere-EX server with 4 eight-core CPUs, and 64 GB memory. This server had hyperthreading enabled so the 32-core server appeared to the hypervisor as 64 logical processors (64 vCPUs). ClearCase CM server was installed on a VM allocated with 4 vCPUs and 8 GB RAM, but without dedicated resources or affinity.
On the same hypervisor we also created 96 VMs (64-bit RHEL 5.5) with 4 vCPUs and 4 GB RAM. These 96 images were used to generate background load. The 96 images were organized into six groups of 16 VMs. Each 16-VM group consists of 64 vCPUs and 64 GB of RAM which corresponds to the hardware dimensions of the Westmere-EX server itself. Therfore each 16-VM group represents 100% of the Westmere-EX's hardware allocation.
To capture baseline average response time data (shown in Table 1, column a)), we simulated a 100-user UCM load and delivered it to the ClearCase CM server. All six groups of VMs were idle.
For the next tests (columns b through g in Table 1), we generated background load using the 6 groups of 16 VMs. Each VM hosted a home-grown program which ran multi-threaded square root calculations and allocated memory. This "hog" program ensured that each VM client would consume 100% of its allocated processor and RAM. Each test increased the background load by 16 VMs or the equivalent of 100% of the Westmere-EX physical hardware.
Column b shows the average response time data of the 100-user CC CM Server test with the equivalent of 100% Westmere-EX load (one group of 16 VMs running the "hog" program). Columns c through g show the average response time data of the 100-user CC CM server test with increasing groups of 16 VMs (increasing the equivalent Westmere-EX load by 100% at each step).
Column g shows the average response time for our 100-user CC CM Server test with all 96 VMs running the "hog" program, equivalent to 600% of the physical Westmere-EX capacity. Response times are terrible. Our overcommitted server can't service the 100-user CC load at reasonable response times.
The only way for our server to guarantee any reasonable performance is to be on a VM with dedicated resources. Column h shows the same test as column g except that the CC CM Server now has affinity and dedicated resources. At 600% load, the CC CM server responds with acceptable performance.
Table 1: Performance tests using ClearCase showing effects of affinity
100% load no affinity
200% load no affinity
300% load no affinity
400% load no affinity
500% load no affinity
600% load no affinity
600% load with affinity
This example shows several things. Without affinity, product performance will degrade drastically to the point of being unusable. Furthermore, a ClearCase administrator with access only to the CC CM server is helpless to understand, or even guess at, what is going on. This background load may be extreme, but it clearly demonstrates the effects of overcommitment.
However, virtualization is not a hopeless proposition. Compare columns c and h in Table 1. It shows 200% load on the hypervisor when the CC CM server has no affinity vs. 600% load on the hypervisor when the CC CM server has affinity. The response times are similar enough to suggest that if dedicated resources are not possible, and hypervisor capacity doesn't exceed 200%, then response times could be acceptable compared to a configuration where resources do have affinity. This suggests that overcommitment can be a viable option as long as it is properly managed.
Case Study No. 4. Overcommit or undercommit: Performance vs. Capacity
Case Study 4 compares the ClearCase response times between two different VM configurations on the same ESX server, an Intel SandyBridge server (E5-2680 @ 2.70GHz) with 32 vCPUs and 32 GB RAM. Configuration A uses 100% of the VMware ESX server's capacity, and Configuration B uses 150%.
In Configuration A, the ESX server hosts a VOB server, a ClearCase Remote Client (CCRC) server and two VMs running the "hog" programs described in Case Study 3. Each VM on the ESX server running RHEL 5.6, and has been allocated with 8 vCPUs and 8 GB RAM. Configuration A has dedicated 100% of the ESX hardware resources.
Configuration B uses the same four VMs of Configuration A; however, an additional two VMs are added to the ESX server. These two additional VMs are sized with 8 vCPUs and 8 GB RAM to match the other four VMs. When all six images are in use (48 vCPUs and 46 GB RAM), the ESX server is allocated at 150% capacity. The two additional VMs create a secondary CC region and perform activities between each other to create load on the ESX server. The activities include importing, mklabels and build operations and run continuously during the test.
The background load in this case study consists of two other ClearCase virtual machines: one VOB server and one ClearCase client acting as a view and build server and also performing mklabel and import operations. A dedicated 1 GB network connects these servers and their images.
This ClearCase test environment is a replica of the actual ClearCase development VOBs. 100 VOBs are spread across two servers. The 10 highest volume VOBs are hosted on the VM image (VOB Server). The remaining 90 VOBs are hosted on a separate physical server with the license server and registry server.
This comparison's workload simulates approximately 250 simultaneous users during a twelve-hour period. The background workload consists of:
- 200 CCRC users performing 15-transactions-per-hour
- 50 Dynamic view users performing 15-transactions-per-hour
- 38 Continuous clearmake builds that run on 12 different additional build hosts (Unix and Windows)
- 1 Independent Unix client running integration tasks
Table 2. Two ESX server configurations
| Configuration A|
(ESX server at 100% capacity)
| Configuration B|
(ESX server at 150% capacity)
| ESX server |
(32 vCPUs, 32 GB RAM)
| Server hosts 4 images: || Server hosts 6 images: |
Figure 1 compares the two configurations' averaged response times over twelve hours. Compared against Configuration A, Configuration B was 35% slower for base ClearCase operations, and 25% slower for UCM operations. Build times were also 22% slower.
Figure 1: Comparing two ClearCase environments
More about affinity and reservations
In Part 1, we defined affinity as the ability to dedicate one or more resources on a virtual machine to the corresponding resources on the hypervisor. In some hypervisor systems there is also the concept of reservation. Reservation is similar in spirit to what we mean by affinity. In these systems affinity signifies something more specific in that a VM's CPUs can be precisely assigned to physical cores. If you assign your dedicated VM to specific CPUs, you should also assign the rest of your VMs to different CPUs. VM's with dedicated CPU affinity may perform worse because they may unable to schedule multi-threaded tasks.
It is our recommendation that VMs have access to dedicated resources on the hypervisor. In situations where the hypervisor is hyperthreaded or if the hypervisor is designed to perform automatic load balancing of resources, there are some additional concerns to keep in mind.
CPU affinity considerations
If using CPU affinity, consider these issues:
- If your hypervisor is using automatic load balancing, CPU affinity may prevent the hypervisor from working efficiently.
- CPU affinity on one VM can prevent the other VMs on the same hypervisor from working efficiently.
- Be careful when moving a VM with CPU affinity from one hypervisor to another as the hypervisors may have different processor configurations.
- CPU affinity on multicore or hyperthreaded machines may actually prevent VMs from scheduling multi-threaded tasks because its requests are limited to specific cores.
Summary and Conclusions
This two-part article discussed virtualization's pros and cons and used concrete examples specific to IBM Rational products.
In Part 1, we covered four important dimensions whose parameters must be precisely determined when using virtualization: CPU, memory, disk I/O and storage, and network. We emphasized the importance of affinity (dedicated resources) and demonstrated what can happen when resources are overcommitted.
We provided examples of how poorly managed virtualization can drastically effect the performance of IBM Rational products. We showed two case studies in which IBM Rational Team Concert and IBM Rational ClearCase performance suffered when they were hosted in poorly configured virtualized environments whose VMs were configured without affinity.
In Part 2, we looked deeper at the tradeoffs around overcommitment.
Drawing upon our experience testing IBM Rational products and advising our customers, we offer suggestions and tips, troubleshooting strategies and vendor-specific examples to help manage your virtualized infrastructure. Troubleshooting situations and suggestions appear in the jazz.net Deployment wiki here:
Virtualization’s key advantages
- The current hardware offerings in the market lend themselves well to being divided and used as hypervisors to host multiple VMs. These new machines save space and power consumption, and are generally very resource efficient.
- Virtualized infrastructure can increase the speed at which fresh VMs (copies of existing VMs or new ready-to-use VMs) can be deployed.
- High-availability (HA) and disaster recovery (DR) solutions can be integrated with virtualization for a more complete and cost-effective enterprise configuration. However, note that a single hypervisor hosting multiple VMs can become a single point of failure. You can work around this specific area of concern by using SAN or NAS storage for the VM images and/or readying standby VMs on an alternate hypervisor.
- VMs and their hypervisors can be managed through consoles from anywhere (not just from within a lab) which can lead to optimization and reduced administration costs.
Given the possible pitfalls of poorly managed virtualization, you may wonder whether it's worth the investment and trouble. The answer is a resounding Yes! Virtualization is a worthwhile investment, but as we have emphasized, virtualization must be properly managed. In some organizations, virtualization is inevitable and permanent. Gone are the days of dedicated physical hardware, single servers dedicated to hosting a single application. Given that hardware vendors are trending towards producing platforms with increased processors and increased RAM, slicing new hardware into VMs is one of the best ways to ensure resource efficiency.
Key principles discussed in this series
- Whenever possible utilize dedicated resources for CPU, memory, and network. Make sure there is ample access to a co-located storage via dedicated i/o.
- Wherever possible, consider CPU and memory affinity. In some cases this may cause other VMs served by the same host to perform worse. In some cases where the hypervisor may be part of a cluster, pinning resources might prevent the entire clan of VMs from performing optimally. There are performance tradeoffs for all VMs when CPU and memory resources can't be dedicated or reserved.
- Whenever possible, manage your virtualization resources by monitoring resource consumption. Understand what other products are hosted on the same VM and what other VMs hosted by the same hypervisor are doing.
- Whenever possible, avoid resource overcommitment. Consequently, the resources of any VM or combination of VMs should never exceed the hypervisor's physical resources.
- If you suspect virtualization-related problems, collect specific data about the VM configuration, the hypervisor and the other VMs being hosted by the hypervisor. Avoid anecdotal information and collect specific, perhaps even periodic measurements using scripts.
Special IBM Rational product considerations
Different software products behave differently. Virtualization parameters which work well for one product on a VM may not work well for another product. In this series of articles, we examined Rational Team Concert and Rational ClearCase. Other Rational products may perform similarly or different, which is why we emphasize understanding virtualization's key dimensions.
Complex multi-tier applications such as the Rational Collaborative Lifecycle Management products or Rational ClearCase require near constant access to dedicated resources. We have worked with customers who experienced poor performance when using Rational products on virtualized environments only to discover the key principles listed above were not enforced.
Credits and Acknowledgements
The authors would like to thank our colleagues Tim Lee, Chetna Warade, David Schlegel, Paul Weiss, Matthias Lee, Samir Shah, Harry Abadi and Poornima Seetharamaiah, our colleagues in Rational Support and Development, and our business partners at Intel, NetApp, and VMware.
- Read theVirtualization policy for IBM softwarefor more details.
- Check these web pages to learn about IBM cloud options:
- Learn about IBMSmartCloud Enterprise, IBM's enterprise-class public cloud infrastructure-as-a-service (IaaS).
- In theCloud computingsection on developerWorks, delve into how-to articles, tutorials, podcasts, demos, and links to much more info to help both those new to cloud computing and those already experienced.
- Watch theIBM CloudBurst video demoon YouTube (4:46 minutes).
- Develop applications in the IBM SmartCloud Enterprise using Rational softwareby Jean-Yves B. Rigolet (IBM® developerWorks®, March 2013).
- Explore theRational software area on developerWorksfor technical resources, best practices, and information about Rational collaborative and integrated solutions for software and systems delivery.
- Stay current withdeveloperWorks technical events and webcastsfocused on a variety of IBM products and IT industry topics.
Get products and technologies
- Download afree trial versionof Rational software.
- Evaluate IBM softwarein the way that suits you best: Download it for a trial, try it online, use it in a cloud environment.
- Join theRational software forumsto ask questions and participate in discussions.
- Ask and answer questions and increase your expertise when you get involved in theRational forums,cafés, andwikis.
- Join theRational communityto share your Rational software expertise and get connected with your peers.
Dig deeper into Rational software on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.