Be smart with virtualization: Part 1. Best practices with IBM Rational software

If you're currently using virtualization methods with IBM Rational software, is everything working as smoothly as you expected? Three IBM experts explain the Rational perspective on virtualization and the key requirements for virtualized environments to get optimal performance from Rational applications. They also share details of two case studies and troubleshooting tips.

Share:

Mike Donati (mjdonati@us.ibm.com), ClearCase Performance Team Lead, IBM

author photoMike Donati lives outside of Boston, where he works on IBM Rational ClearCase performance and customer deployments, including virtualization strategies. When not working, he divides his time among traveling with his family, cooking, photography, and attending his daughters' sporting events.



Ryan Smith (smithr1@us.ibm.com), Software Performance Analyst, IBM

author photoRyan Smith has been working in performance engineering for the past eight years. He lives in a small town in the farmlands of western Tennessee, where he collaborates remotely with colleagues around the world on the performance and reliability of the Rational solution for Collaborative Application Lifecycle Management (CLM). Professionally, his interests are in agile and lean software development, performance testing, Java and web technologies, data analysis, and data visualization. In his free time, he hunts, fishes, enjoys reading about sustainability, leadership, and organizing, and spends time with his wife and church activities.



Grant Covell (gcovell@us.ibm.com), Senior Development Manager, Rational Performance Engineering, IBM

Grant Covell photoGrant Chu Covell has been working for IBM Rational software on performance-related things for nearly 10 years. He's now the Senior Performance Obsessor on the Jazz Jumpstart team. Before that, he managed the Rational Performance Engineering team. Years ago, he did software development work on typefaces, music notation software, and automatic language translation. He lives outside of Boston. You can follow his Jumpstart team blog, called Ratl Perf Land.



02 April 2013

Also available in Chinese Russian Portuguese Spanish

Everybody is talking about virtualization. If we're to believe the hype, virtualization will revolutionize IT as we know it, optimize scarce resources, and save everyone money. Server virtualization promises to be one of the decade's most important developments. However, virtualization has been around for quite some time, and IBM has been a leader in this space on the IBM® System z® and Power Systems™ platforms. In the past few years, virtualization technology on System x® and Intel-based x86 architecture has matured and become more pervasive. Used properly, virtualization is an essential part of the IT toolbox. There's no doubt virtualization is here to stay.

But every technology has its perils. Poorly managed virtualization can cause applications to run more slowly, which can result in end users who are upset and dissatisfied customers. IBM fully supports its products deployed in virtualized environments. Perhaps because of its ubiquity and enticing promises we have seen customers fall prey to poorly managed virtualized environments and thus they fail to realize any of the promised benefits.

This two-part article discusses virtualization's pros and cons through concrete examples. In Part 1, we explain virtualization at a high level, especially as it relates to IBM Rational software. We discuss the key requirements of a well-managed virtualized environment, and we show examples of how IBM® Rational® ClearCase® and IBM® Rational Team Concert™ behave in poorly configured virtualized environments. We offer suggestions and tips for properly managing your virtualized infrastructure, drawing upon our experience testing Rational software and advising customers.

Part 2 continues our discussion of suggestions recommendations and tips and includes troubleshooting and vendor-specific examples.

A brief history of virtualization

Despite its emergence as a compelling, necessary technology in the past few years, server virtualization has actually been around for quite some time. In the 1970s, IBM introduced hypervisor technology in the System z and System i® product lines. Logical partitions (LPARs) became possible on System p® in 2000. The advent of virtual machines on System x and Intel-based x86 hardware was possible as early as 1999. In just the last few years, virtualization has become essential and inevitable in Microsoft Windows and Linux environments.

Clouds in the forecast

Virtualization is routinely linked with cloud technology. It's important to understand the relationship. In the broadest terms, cloud technology is all about delivering server capability as a service. Virtualization is a key technique for managing the server resources that provide such a capability.

We also need to distinguish between public and private clouds. In simple terms, private clouds are isolated and can be managed and hosted within a company or, in some cases, externally. Private clouds are protected and secured with firewalls, authentication, VPNs, and so forth. Public clouds usually have far less security and are effectively in plain view and can be shared and accessed by anyone. Many popular public services are delivered "in the cloud," such as email, file, and photo storage. The public cloud model attracts some organizations because, in theory, organizations or individuals pay for only what they needs, services are always available from anywhere, and the cloud provider handles most of the IT management tasks.

We find that some IBM Rational customers shy away from the possible instability and security concerns of public cloud environments. They prefer the internally hosted, more closely managed private-cloud approach, where they can control all aspects of server resource allocation, set specific quality-of-service goals, and employ proven high availability and disaster recovery solutions.

However, some prefer IBM cloud-based solutions, because they are designed and managed by using best practices for software development and hosting strategies (IBM SmartCloud Enterprise). IBM also offers Rational software in private cloud deployments, through IBM CloudBurst, for example. (See the Resources section for more information about those options.).


Basic concepts

Stated simply, virtualization permits carving a larger server (the host or hypervisor) into smaller servers (the guests, clients, or virtual machines) and sharing the combined pool of resources. It's well known that most servers don't operate at full capacity all of the time. Therefore, why not share and combine them? Two servers that average 25% capacity could become virtual machines (VMs) and be hosted on a single hypervisor that would then average approximately 50% capacity. Of course, the host operating system and the hypervisor software contribute measurable overhead, and there are other details involved, as well.

The host manages the clients' resources through software or emulation. Generally, nothing within the virtual machine indicates that it is, in fact, virtual. In most cases, administrators who are installing software on virtual machines cannot tell that they using virtual servers. More recent innovations, such as virtualization technology built into the hypervisor's chipset, permit more precise and optimized handling of hardware resources, such as peripheral drivers.


The Rational perspective on virtualization

IBM supports virtualization and, consequently, IBM Rational products are supported on virtualized servers. However, we insist that the virtualized infrastructure be properly managed and monitored. It is crucial to understand how your virtualized infrastructure uses affinity and overcommitment and to be sure that you are using affinity and overcommitment in a way that ensures the best performance of your IBM Rational software.

What is this thing called "affinity"?

Affinity (also called entitlement,pinning, dedication) is the ability to dedicate one or more resources on a virtual machine (memory or processor, for example) to the corresponding resources on the hypervisor. The host distributes resources as the virtual machines need them. Affinity ensures that the requested resources dedicated to that virtual machine are always available when the virtual machine requires them.

Remember that virtual machines share system resources with all of the other virtual machines on the same host.

Overcommitment is when the total amount of virtual image resource allocation exceeds the physical resources of the hardware (be sure to count the hypervisor resources, too). To satisfy a virtual machine's peak needs, the hypervisor can take resources from other virtual machines. Sometimes, the combined needs of all of the virtual machines might exceed the actual resources of the hypervisor. Sometimes, overcommitment can cause all of the virtual machines on a host to suffer.


Virtualization's four dimensions

As with any configurable technology, there are tradeoffs to make. From the Rational product perspective, if you are using virtualization, we suggest that you keep an eye on four key dimensions. These dimensions are perhaps the most significant characteristics of any server:

  • CPU
  • Memory
  • Disk Input/Output (I/O) and Storage
  • Network
Table 1. Four dimensions of virtualization
 Worst (unmanaged) virtualization characteristicsBest (managed) virtualization characteristics
CPU
  • Chipset has no VT or V-chip support
  • Shared resource pool
  • No entitlement, guaranteed or prioritized scheduling
  • Capacity of other VMs unknown
  • vCPUs are fractions of physical CPUs
  • Hyperthreading or multithreading emulated (non-Nehalem class)
  • Chipset has VT support
  • CPU affinity allows VMs dedicated use of vCPUs
  • Allocated vCPUs on par with physical CPU
Memory
  • Memory and CPU not co-located
  • Overcommitment leads to excessive swapping (including across other VMs)
  • Affinity is set
  • Memory and CPU are co-located
Disk I/O and Storage
  • Single local SATA or IDE disk with low IOPS
  • Local RAID but limited drive bays
  • Unknown number of channels access same storage
  • Fibre-channel connection to storage
  • File storage
Network
  • Single 1 G (or less) network port shared by all VMs
  • Dedicated network port
  • 10 G or better network
  • Link aggregation

CPU

Modern CPUs have been designed with virtualization in mind. For example, Intel's VT and AMD's V chip technology ensure that x86 CPUs can optimally handle a virtualized load. Other platforms can use hardware-assisted virtualization, which promises more efficient virtualization through direct addressing of host CPUs.

In the worst-managed virtualization environment, the actual hardware and CPU do not support virtualization or do so only through emulation and relying upon a slower software layer. In the worst-managed environment, no effort is made to keep track of the other VMs. In fact, other VMs can freely steal away CPU cycles from other VMs at will. Fractional CPUs are sometimes possible or inevitable, but in an ideally managed virtualized environment, VMs have access to entire physical CPUs.

Memory

In the worst-managed virtualization environment, server memory and CPUs are not on the same bus. Modern hardware makes use of nonuniform memory access (NUMA), whereby a processor can access local memory faster than memory located remotely, on another bus. On modern hardware, it makes no sense to work against the NUMA architecture that is designed for speed and scalability.

Be wary of the options and settings that your virtualization software might provide, because it's often easy to counteract the inherently efficient NUMA architecture. It might seem useful to have memory at location A refer to CPU at location B, but this arrangement creates extra work for the server and can result in decreased performance. In the ideally managed VM, memory and CPU are on the same bus.

Disk I/O and storage

In the worst-managed virtualized environment, a single local drive supports all of the VMs. Given that multiple servers are sharing the same disc, the actual I/O activity is now occurring across more locations on the disc. It's possible that a local hard drive might experience failure sooner simply because of the increased activity. Ideally, VMs are allocated to specific drives, each with their own I/O and mechanisms. In some ideal environments, storage can be on filers, which are well-suited to virtualization's demands, and accessed through a fibre channel.

Network

It's probably easy to imagine a poorly virtualized network configuration. Network cards are measured by their capacity (1 Gb/sec, for example). When shared by VMs, their capacity is subdivided (2 VMs sharing 1 Gb would receive a maximum of 500 Mb/sec). The worst-managed virtualized environment has a single network card or port for all of the VMs that it hosts, and the total throughput is divided among VMs. In an ideally managed virtualized environment, each VM has a dedicated network port, or a 10 Gb or greater network connection is shared by the VMs. Link aggregation is a network technique where multiple network connections are used together for redundancy and throughput optimization.


Best (managed) virtualization characteristics

For the best, ideally managed virtualization environment, we can summarize our previous points concisely:

  • The number of virtual CPUs never exceeds the number of physical CPUs.
  • The CPU allocation for each VM corresponds to actual physical CPUs.
  • The amount of virtual RAM never exceeds the amount of actual RAM.
  • There is ample access to storage and network.

Some may argue that this advice defeats the point of virtualization or that we are being over cautious. In an ideal world, every VM has access to unlimited resources on-demand. However, in actuality, VMs must share resources. We find that Rational software performs best and behaves most effectively on virtualized servers when resources are closely managed to ensure that there is always dedicated CPU, memory, and other resources.

If properly managed, overcommitted resource allocation can be a viable virtualization strategy, but only when there is close monitoring of the actual resources used. When overcommitment does occur, an organization must be willing to accept the tradeoffs between slow or unpredictable performance and optimized administration costs.


It's like musical chairs

In a badly managed virtualized environment, virtual machines can share the host's resources indiscriminately. Any VM can ask for more memory or CPU than it has been allocated, and a poorly managed hypervisor will provide it, time-slicing those requests from the other VMs.

It really is like musical chairs. It is possible for a single 16-processor, 64 GB RAM server to host five distinct 4-processor 16 GB RAM servers. When Virtual Machine A requires processor cycles, it will ask the host for them. The host will either take any unused processor time from any of the four other VMs or write the other VMs' data to disc to free up resources.

This model might work perfectly fine if the five servers never operate at high capacity at the same time. Back-end processes that end users never see might be able to run slower and with more interruptions. However, business-critical applications that end users touch will probably show the effects of overcommitment when their applications stall or slow.


Case Study No. 1. "The Mystery Menu," Rational Team Concert in three badly managed clouds

The IBM® Rational® Performance Engineering team explored three VM images provided from a cloud provider. Each image was different, as if we were choosing small, medium, and large portion sizes from a menu. Except for knowing the theoretical number of virtual CPUs and memory provided with each VM, we knew very little about the actual specifications of the VMs (storage type, network, processor speed, and so forth).

VM A had 4 virtual CPUs and 8 GB of RAM; VM B had 8 virtual CPUs and 16 GB of RAM; VM C had 16 virtual CPUs and 16 GB of RAM. Our intent was to deliver the same amount of load to each image and see how the different image sizes behaved.

We delivered a simulated multiuser load to VM A (4 vCPU, 8 GB) and quickly hit 100% CPU and memory capacity. Performance was fairly acceptable, but we wanted to improve it. If we had been using physical machines, we would have immediately increased both the number of cores and amount of RAM, and that is what we did.

We delivered the same load to VM B (8 vCPU, 16 GB), and the results surprised us: Performance degraded dramatically, but proportionally less CPU and memory were used. CPU and memory were no longer the bottleneck. Instead, the bottleneck was the disk I/O.

When we delivered the same load to VM C (16 vCPU, 18 GB), performance got even worse, and the proportion of CPU used decreased again. The bottleneck was still the disk I/O. (For clarity, we have averaged the disk I/O across each test and expressed the average in 25% increments.)

Figure 1. Three VMs in an unmanaged cloud
Bar graph shows CPU, memory, disk results by VM

Our explanation is that these images were hosted on an unmanaged cloud, and the larger VMs, B and C were in fact stealing cores and memory from other VMs. Without knowing how the hypervisor was managed, we can only speculate that overcommitted resources led to disk swapping. The CPUs were underused because the system was spending most of its time writing other images to disk to get the resources that our VM was asking for. (Presumably, our images were being written to disk too when the other VMs were requesting cycles and RAM.)

Figure 2 offers another way to look at the same data, showing results for unused and used memory, unused and used CPU, unused and used disk. This might explain more clearly what was going on.

Figure 2. Another analysis of the same three VMs
Bar graph shows results

For Figure 2, we graphed the three VMs so that the amounts of CPU and memory are proportional to each other, and we used wickets , or unfilled bars in the graph to indicate unused CPU and memory capacity for the three VMs. VM A is using all of its allocated CPU and memory but not much disk. VM B has access to more CPU and memory, but is unable to use it all because it is bottlenecked by increased disk activity. VM C offers more CPU, which is only slightly used by the VM because, like VM B, VM C is bottlenecked on disk.

Those of us who remember managing physical environments might be surprised by these results. In most cases with physical hardware, increasing CPU and memory also increased performance. However, in a poorly managed virtualized environment where CPU and memory are unbounded, increasing a VM's CPU and memory can sometimes lead to slower performance.

Our conclusions

Our first conclusion is to reassure that, in a properly managed VM, Rational software will perform properly. Case Study 1 provides examples of a poorly behaving application in an unmanaged cloud.

Secondly, we reiterate that VMs must be used in a managed environment. In this case study, we had no knowledge of the hypervisor or other VMs in the environment. We believed our performance was determined by the configuration and behavior of other VMs in the same environment, but we couldn't be sure.

Thirdly, we warn against assuming that the same principles that worked with physical hardware will also work within the VM environment. If there had been excess CPU and memory resources, then increasing the sizes of the VMs might have produced an improvement, but we were actually overcommitting resources.

Finally, we repeat the point that overcommitting resources leads to poor VM and, consequently, poor application performance. In this example, our application was Rational Team Concert, but we have observed other Rational software performing badly in poorly managed environments, regardless of the virtualization technology and operating system.


Case Study 2. "So, just how important is affinity?" ClearCase in an overcommitted cloud with a rogue load

Our next example was demonstrated live during a conference. We illustrated how setting affinity (sometimes called entitlement or dedicated resources) can stabilize IBM Rational applications, but not using affinity can allow other VM images to take over resources and slow your application performance to a crawl.

We used an Intel Sandy Bridge server with 32 virtual CPUs and 32 GB RAM hosting two separate IBM® Rational® ClearCase® deployments. Each ClearCase deployment consists of identical Red Hat Enterprise Linux (RHEL) 5.5 VMs (4 vCPUs, 8 GB RAM) with a ClearCase VOB server and ClearCase CM server hosting web views. We used VMware ESX to host the VMs. The VMs hosting ClearCase in Deployment A are not using affinity, whereas the ClearCase VMs in Deployment B have both CPU and memory affinity. Outside of the cloud, on physical hardware, we used two IBM® Rational® Performance Tester workbenches to drive a 100-user load simulation against each deployment.

On the ESX server, we created several VMs that were intended to do nothing but create demand for CPU, memory, and disk. These VMs contained simple programs that executed math calculations and allocated all free memory, which resulted in 100% memory and 100% CPU use. We cranked up these programs to overcommit the ESX server by 300%. (We asked these rogue VMs to use three times the amount of physical CPUs and memory on the hypervisor.)

Figures 3 and 4 are taken from Rational Performance Tester. They show average response times (measured in milliseconds) for the ClearCase transactions executed by Deployment A and Deployment B. Both deployments behave consistently until around the 1,200 second mark when we activated the programs on the rogue VMs. Deployment A, where the ClearCase VM ran without affinity, showed a sudden increase in response times. Deployment B, where the ClearCase VMs ran with affinity, showed occasional slowness but held fairly constant except for some spikes. At around the 4000 second mark, the rogue images were stopped and Deployment A returned to normal. (Notice that the scale of the y axis, which shows the average transaction response times measured in milliseconds, is not the same in the two graphs.)

Figure 3. Deployment A without affinity
Graph shows poor ClearCase response times
Figure 4. Deployment B with CPU and memory affinity
Graph shows acceptable ClearCase response times

Comparing the tests, ClearCase operations on Deployment A without affinity took an average of 118 seconds to complete compared to Deployment B with affinity, where they took an average of 18 seconds. Deployment B with affinity was, on average, six to seven times faster.

Our conclusions

Case Study 2 is perhaps extreme, because we created a rogue VM load that might have been unrealistic. However, we were able to clearly show how an application's performance can degrade if you do not know what else your hypervisor is doing or if the other VMs need to request resources.

Setting processor and memory affinity permitted the applications on the VM that we cared about to maintain consistent performance and behavior, even when the rest of the VMs in the environment were executing extreme loads.

Notice that in Deployment A without affinity, performance did return to normal after the rogue VMs were halted. If there are VMs in your environment that are permitted to run overcommitted or uncapped, you might see similar behavior in your VMs.

In this example, our Rational application was ClearCase, but we have observed similar poor performance in other Rational software in similarly poorly managed environments, regardless of the virtualization technology and operating system.


Virtualization is here to stay, so learn to use it wisely

There is no doubt that virtualization is here to stay. More and more IBM Rational customers are using it and relying upon it. However, as we have shown, virtualization can be used poorly with detrimental effects to the software's operation.

It's important to understand virtualization and know how to manage it. We hope our case studies show some of the possible side effects of poorly managed virtual environments. It's possible to recognize symptoms of poorly managed virtualized environments after considering our case studies. In Part 2, we explore further symptoms of virtualization gone wrong. We also offer troubleshooting tips and show vendor-specific examples.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Rational software on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Rational
ArticleID=863074
ArticleTitle=Be smart with virtualization: Part 1. Best practices with IBM Rational software
publish-date=04022013