Everybody is talking about virtualization. If we're to believe the hype, virtualization will revolutionize IT as we know it, optimize scarce resources, and save everyone money. Server virtualization promises to be one of the decade's most important developments. However, virtualization has been around for quite some time, and IBM has been a leader in this space on the IBM® System z® and Power Systems™ platforms. In the past few years, virtualization technology on System x® and Intel-based x86 architecture has matured and become more pervasive. Used properly, virtualization is an essential part of the IT toolbox. There's no doubt virtualization is here to stay.
But every technology has its perils. Poorly managed virtualization can cause applications to run more slowly, which can result in end users who are upset and dissatisfied customers. IBM fully supports its products deployed in virtualized environments. Perhaps because of its ubiquity and enticing promises we have seen customers fall prey to poorly managed virtualized environments and thus they fail to realize any of the promised benefits.
This two-part article discusses virtualization's pros and cons through concrete examples. In Part 1, we explain virtualization at a high level, especially as it relates to IBM Rational software. We discuss the key requirements of a well-managed virtualized environment, and we show examples of how IBM® Rational® ClearCase® and IBM® Rational Team Concert™ behave in poorly configured virtualized environments. We offer suggestions and tips for properly managing your virtualized infrastructure, drawing upon our experience testing Rational software and advising customers.
Part 2 continues our discussion of suggestions recommendations and tips and includes troubleshooting and vendor-specific examples.
Clouds in the forecast
Virtualization is routinely linked with cloud technology. It's important to understand the relationship. In the broadest terms, cloud technology is all about delivering server capability as a service. Virtualization is a key technique for managing the server resources that provide such a capability.
We also need to distinguish between public and private clouds. In simple terms, private clouds are isolated and can be managed and hosted within a company or, in some cases, externally. Private clouds are protected and secured with firewalls, authentication, VPNs, and so forth. Public clouds usually have far less security and are effectively in plain view and can be shared and accessed by anyone. Many popular public services are delivered "in the cloud," such as email, file, and photo storage. The public cloud model attracts some organizations because, in theory, organizations or individuals pay for only what they needs, services are always available from anywhere, and the cloud provider handles most of the IT management tasks.
We find that some IBM Rational customers shy away from the possible instability and security concerns of public cloud environments. They prefer the internally hosted, more closely managed private-cloud approach, where they can control all aspects of server resource allocation, set specific quality-of-service goals, and employ proven high availability and disaster recovery solutions.
However, some prefer IBM cloud-based solutions, because they are designed and managed by using best practices for software development and hosting strategies (IBM SmartCloud Enterprise). IBM also offers Rational software in private cloud deployments, through IBM CloudBurst, for example. (See the Resources section for more information about those options.).
Stated simply, virtualization permits carving a larger server (the host or hypervisor) into smaller servers (the guests, clients, or virtual machines) and sharing the combined pool of resources. It's well known that most servers don't operate at full capacity all of the time. Therefore, why not share and combine them? Two servers that average 25% capacity could become virtual machines (VMs) and be hosted on a single hypervisor that would then average approximately 50% capacity. Of course, the host operating system and the hypervisor software contribute measurable overhead, and there are other details involved, as well.
The host manages the clients' resources through software or emulation. Generally, nothing within the virtual machine indicates that it is, in fact, virtual. In most cases, administrators who are installing software on virtual machines cannot tell that they using virtual servers. More recent innovations, such as virtualization technology built into the hypervisor's chipset, permit more precise and optimized handling of hardware resources, such as peripheral drivers.
The Rational perspective on virtualization
IBM supports virtualization and, consequently, IBM Rational products are supported on virtualized servers. However, we insist that the virtualized infrastructure be properly managed and monitored. It is crucial to understand how your virtualized infrastructure uses affinity and overcommitment and to be sure that you are using affinity and overcommitment in a way that ensures the best performance of your IBM Rational software.
What is this thing called "affinity"?
Affinity (also called entitlement,pinning, dedication) is the ability to dedicate one or more resources on a virtual machine (memory or processor, for example) to the corresponding resources on the hypervisor. The host distributes resources as the virtual machines need them. Affinity ensures that the requested resources dedicated to that virtual machine are always available when the virtual machine requires them.
Remember that virtual machines share system resources with all of the other virtual machines on the same host.
Overcommitment is when the total amount of virtual image resource allocation exceeds the physical resources of the hardware (be sure to count the hypervisor resources, too). To satisfy a virtual machine's peak needs, the hypervisor can take resources from other virtual machines. Sometimes, the combined needs of all of the virtual machines might exceed the actual resources of the hypervisor. Sometimes, overcommitment can cause all of the virtual machines on a host to suffer.
Virtualization's four dimensions
As with any configurable technology, there are tradeoffs to make. From the Rational product perspective, if you are using virtualization, we suggest that you keep an eye on four key dimensions. These dimensions are perhaps the most significant characteristics of any server:
- Disk Input/Output (I/O) and Storage
Table 1. Four dimensions of virtualization
|Worst (unmanaged) virtualization characteristics||Best (managed) virtualization characteristics|
|Disk I/O and Storage|
Modern CPUs have been designed with virtualization in mind. For example, Intel's VT and AMD's V chip technology ensure that x86 CPUs can optimally handle a virtualized load. Other platforms can use hardware-assisted virtualization, which promises more efficient virtualization through direct addressing of host CPUs.
In the worst-managed virtualization environment, the actual hardware and CPU do not support virtualization or do so only through emulation and relying upon a slower software layer. In the worst-managed environment, no effort is made to keep track of the other VMs. In fact, other VMs can freely steal away CPU cycles from other VMs at will. Fractional CPUs are sometimes possible or inevitable, but in an ideally managed virtualized environment, VMs have access to entire physical CPUs.
In the worst-managed virtualization environment, server memory and CPUs are not on the same bus. Modern hardware makes use of nonuniform memory access (NUMA), whereby a processor can access local memory faster than memory located remotely, on another bus. On modern hardware, it makes no sense to work against the NUMA architecture that is designed for speed and scalability.
Be wary of the options and settings that your virtualization software might provide, because it's often easy to counteract the inherently efficient NUMA architecture. It might seem useful to have memory at location A refer to CPU at location B, but this arrangement creates extra work for the server and can result in decreased performance. In the ideally managed VM, memory and CPU are on the same bus.
Disk I/O and storage
In the worst-managed virtualized environment, a single local drive supports all of the VMs. Given that multiple servers are sharing the same disc, the actual I/O activity is now occurring across more locations on the disc. It's possible that a local hard drive might experience failure sooner simply because of the increased activity. Ideally, VMs are allocated to specific drives, each with their own I/O and mechanisms. In some ideal environments, storage can be on filers, which are well-suited to virtualization's demands, and accessed through a fibre channel.
It's probably easy to imagine a poorly virtualized network configuration. Network cards are measured by their capacity (1 Gb/sec, for example). When shared by VMs, their capacity is subdivided (2 VMs sharing 1 Gb would receive a maximum of 500 Mb/sec). The worst-managed virtualized environment has a single network card or port for all of the VMs that it hosts, and the total throughput is divided among VMs. In an ideally managed virtualized environment, each VM has a dedicated network port, or a 10 Gb or greater network connection is shared by the VMs. Link aggregation is a network technique where multiple network connections are used together for redundancy and throughput optimization.
Best (managed) virtualization characteristics
For the best, ideally managed virtualization environment, we can summarize our previous points concisely:
- The number of virtual CPUs never exceeds the number of physical CPUs.
- The CPU allocation for each VM corresponds to actual physical CPUs.
- The amount of virtual RAM never exceeds the amount of actual RAM.
- There is ample access to storage and network.
Some may argue that this advice defeats the point of virtualization or that we are being over cautious. In an ideal world, every VM has access to unlimited resources on-demand. However, in actuality, VMs must share resources. We find that Rational software performs best and behaves most effectively on virtualized servers when resources are closely managed to ensure that there is always dedicated CPU, memory, and other resources.
If properly managed, overcommitted resource allocation can be a viable virtualization strategy, but only when there is close monitoring of the actual resources used. When overcommitment does occur, an organization must be willing to accept the tradeoffs between slow or unpredictable performance and optimized administration costs.
It's like musical chairs
In a badly managed virtualized environment, virtual machines can share the host's resources indiscriminately. Any VM can ask for more memory or CPU than it has been allocated, and a poorly managed hypervisor will provide it, time-slicing those requests from the other VMs.
It really is like musical chairs. It is possible for a single 16-processor, 64 GB RAM server to host five distinct 4-processor 16 GB RAM servers. When Virtual Machine A requires processor cycles, it will ask the host for them. The host will either take any unused processor time from any of the four other VMs or write the other VMs' data to disc to free up resources.
This model might work perfectly fine if the five servers never operate at high capacity at the same time. Back-end processes that end users never see might be able to run slower and with more interruptions. However, business-critical applications that end users touch will probably show the effects of overcommitment when their applications stall or slow.
Case Study No. 1. "The Mystery Menu," Rational Team Concert in three badly managed clouds
The IBM® Rational® Performance Engineering team explored three VM images provided from a cloud provider. Each image was different, as if we were choosing small, medium, and large portion sizes from a menu. Except for knowing the theoretical number of virtual CPUs and memory provided with each VM, we knew very little about the actual specifications of the VMs (storage type, network, processor speed, and so forth).
VM A had 4 virtual CPUs and 8 GB of RAM; VM B had 8 virtual CPUs and 16 GB of RAM; VM C had 16 virtual CPUs and 16 GB of RAM. Our intent was to deliver the same amount of load to each image and see how the different image sizes behaved.
We delivered a simulated multiuser load to VM A (4 vCPU, 8 GB) and quickly hit 100% CPU and memory capacity. Performance was fairly acceptable, but we wanted to improve it. If we had been using physical machines, we would have immediately increased both the number of cores and amount of RAM, and that is what we did.
We delivered the same load to VM B (8 vCPU, 16 GB), and the results surprised us: Performance degraded dramatically, but proportionally less CPU and memory were used. CPU and memory were no longer the bottleneck. Instead, the bottleneck was the disk I/O.
When we delivered the same load to VM C (16 vCPU, 18 GB), performance got even worse, and the proportion of CPU used decreased again. The bottleneck was still the disk I/O. (For clarity, we have averaged the disk I/O across each test and expressed the average in 25% increments.)
Figure 1. Three VMs in an unmanaged cloud
Our explanation is that these images were hosted on an unmanaged cloud, and the larger VMs, B and C were in fact stealing cores and memory from other VMs. Without knowing how the hypervisor was managed, we can only speculate that overcommitted resources led to disk swapping. The CPUs were underused because the system was spending most of its time writing other images to disk to get the resources that our VM was asking for. (Presumably, our images were being written to disk too when the other VMs were requesting cycles and RAM.)
Figure 2 offers another way to look at the same data, showing results for unused and used memory, unused and used CPU, unused and used disk. This might explain more clearly what was going on.
Figure 2. Another analysis of the same three VMs
For Figure 2, we graphed the three VMs so that the amounts of CPU and memory are proportional to each other, and we used wickets , or unfilled bars in the graph to indicate unused CPU and memory capacity for the three VMs. VM A is using all of its allocated CPU and memory but not much disk. VM B has access to more CPU and memory, but is unable to use it all because it is bottlenecked by increased disk activity. VM C offers more CPU, which is only slightly used by the VM because, like VM B, VM C is bottlenecked on disk.
Those of us who remember managing physical environments might be surprised by these results. In most cases with physical hardware, increasing CPU and memory also increased performance. However, in a poorly managed virtualized environment where CPU and memory are unbounded, increasing a VM's CPU and memory can sometimes lead to slower performance.
Our first conclusion is to reassure that, in a properly managed VM, Rational software will perform properly. Case Study 1 provides examples of a poorly behaving application in an unmanaged cloud.
Secondly, we reiterate that VMs must be used in a managed environment. In this case study, we had no knowledge of the hypervisor or other VMs in the environment. We believed our performance was determined by the configuration and behavior of other VMs in the same environment, but we couldn't be sure.
Thirdly, we warn against assuming that the same principles that worked with physical hardware will also work within the VM environment. If there had been excess CPU and memory resources, then increasing the sizes of the VMs might have produced an improvement, but we were actually overcommitting resources.
Finally, we repeat the point that overcommitting resources leads to poor VM and, consequently, poor application performance. In this example, our application was Rational Team Concert, but we have observed other Rational software performing badly in poorly managed environments, regardless of the virtualization technology and operating system.
Case Study 2. "So, just how important is affinity?" ClearCase in an overcommitted cloud with a rogue load
Our next example was demonstrated live during a conference. We illustrated how setting affinity (sometimes called entitlement or dedicated resources) can stabilize IBM Rational applications, but not using affinity can allow other VM images to take over resources and slow your application performance to a crawl.
We used an Intel Sandy Bridge server with 32 virtual CPUs and 32 GB RAM hosting two separate IBM® Rational® ClearCase® deployments. Each ClearCase deployment consists of identical Red Hat Enterprise Linux (RHEL) 5.5 VMs (4 vCPUs, 8 GB RAM) with a ClearCase VOB server and ClearCase CM server hosting web views. We used VMware ESX to host the VMs. The VMs hosting ClearCase in Deployment A are not using affinity, whereas the ClearCase VMs in Deployment B have both CPU and memory affinity. Outside of the cloud, on physical hardware, we used two IBM® Rational® Performance Tester workbenches to drive a 100-user load simulation against each deployment.
On the ESX server, we created several VMs that were intended to do nothing but create demand for CPU, memory, and disk. These VMs contained simple programs that executed math calculations and allocated all free memory, which resulted in 100% memory and 100% CPU use. We cranked up these programs to overcommit the ESX server by 300%. (We asked these rogue VMs to use three times the amount of physical CPUs and memory on the hypervisor.)
Figures 3 and 4 are taken from Rational Performance Tester. They show average response times (measured in milliseconds) for the ClearCase transactions executed by Deployment A and Deployment B. Both deployments behave consistently until around the 1,200 second mark when we activated the programs on the rogue VMs. Deployment A, where the ClearCase VM ran without affinity, showed a sudden increase in response times. Deployment B, where the ClearCase VMs ran with affinity, showed occasional slowness but held fairly constant except for some spikes. At around the 4000 second mark, the rogue images were stopped and Deployment A returned to normal. (Notice that the scale of the y axis, which shows the average transaction response times measured in milliseconds, is not the same in the two graphs.)
Figure 3. Deployment A without affinity
Figure 4. Deployment B with CPU and memory affinity
Comparing the tests, ClearCase operations on Deployment A without affinity took an average of 118 seconds to complete compared to Deployment B with affinity, where they took an average of 18 seconds. Deployment B with affinity was, on average, six to seven times faster.
Case Study 2 is perhaps extreme, because we created a rogue VM load that might have been unrealistic. However, we were able to clearly show how an application's performance can degrade if you do not know what else your hypervisor is doing or if the other VMs need to request resources.
Setting processor and memory affinity permitted the applications on the VM that we cared about to maintain consistent performance and behavior, even when the rest of the VMs in the environment were executing extreme loads.
Notice that in Deployment A without affinity, performance did return to normal after the rogue VMs were halted. If there are VMs in your environment that are permitted to run overcommitted or uncapped, you might see similar behavior in your VMs.
In this example, our Rational application was ClearCase, but we have observed similar poor performance in other Rational software in similarly poorly managed environments, regardless of the virtualization technology and operating system.
Virtualization is here to stay, so learn to use it wisely
There is no doubt that virtualization is here to stay. More and more IBM Rational customers are using it and relying upon it. However, as we have shown, virtualization can be used poorly with detrimental effects to the software's operation.
It's important to understand virtualization and know how to manage it. We hope our case studies show some of the possible side effects of poorly managed virtual environments. It's possible to recognize symptoms of poorly managed virtualized environments after considering our case studies. In Part 2, we explore further symptoms of virtualization gone wrong. We also offer troubleshooting tips and show vendor-specific examples.
- Read the Virtualization policy for IBM software for more details.
- Check these web pages to learn about IBM cloud options:
- Learn about IBM SmartCloud Enterprise, IBM's enterprise-class public cloud infrastructure-as-a-service (IaaS).
- In the Cloud computing section on developerWorks, delve into how-to articles, tutorials, podcasts, demos, and links to much more info to help both those new to cloud computing and those already experienced.
- Watch the IBM CloudBurst video demo on YouTube (4:46 minutes).
- Develop applications in the IBM SmartCloud Enterprise using Rational software by Jean-Yves B. Rigolet (IBM® developerWorks®, March 2013).
- Explore the Rational software area on developerWorks for technical resources, best practices, and information about Rational collaborative and integrated solutions for software and systems delivery.
- Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics.
Get products and technologies
- Download a free trial version of Rational software.
- Evaluate IBM software in the way that suits you best: Download it for a trial, try it online, use it in a cloud environment.
- Join the Rational software forums to ask questions and participate in discussions.
- Ask and answer questions and increase your expertise when you get involved in the Rational forums, cafés, and wikis.
- Join the Rational community to share your Rational software expertise and get connected with your peers.
- Rate or review Rational software. It's quick and easy.