In each column, The WebSphere Contrarian answers questions, provides guidance, and otherwise discusses fundamental topics related to the use of WebSphere products, often dispensing field-proven advice that contradicts prevailing wisdom.
It was good to see so many customers at the recent IMPACT 2008 conference, and I enjoyed meeting with many of you there during the course of the event. As I write this, I have recovered from “the impact before IMPACT,” when I tripped and fell while traveling to the conference, breaking my fall with the right side of my face, which resulted in some cuts and scrapes and an impressive black eye. As I mentioned when so many of you asked what happened to you? it looked worse that it was, but aside from my injured pride and a torn pair of trousers all is well once more.
I had originally planned to call this article “Avoiding virtualization misadventures with WebSphere Application Server,” and while that would have been consistent with my general outlook on such things, I also knew that the editor would probably change the title [Ed: He’s right.], so instead I decided to go with the title you see above. In any event, I wanted to take this opportunity to briefly touch on several aspects of virtualization and how best to integrate a virtualization technology with WebSphere Application Server (and the products running on WebSphere Application Server) to make this process both productive and painless.
While many of you are likely familiar with the concept of virtualization, it’s likely prudent to first take a step back and define virtualization before discussing how to best use it. Essentially, virtualization is an abstraction or a masking of underlying physical resources (such as a server) from operating system images or instances running on the physical resource. By abstracting the operating system from the underlying hardware, you can create multiple independent or isolated OS environments on a given set of hardware and, depending on the virtualization technology in use, the OS environments can either be homogenous or heterogeneous. This capability enables the consolidation of multiple environments on a single server that are dedicated and isolated from other environments. Some possible examples are
- Multiple WebSphere Application Server versions on the same physical system. (Server virtualization is not a requirement for running multiple WebSphere Application Server versions on the same machine. WebSphere Application Server provides the ability to run multiple WebSphere Application Server instances on the same OS under a coexistence configuration, but the use of virtualization provides addition isolation at the OS level.)
- Multiple operating system and application server versions on the same system.
- Multiple test environments sharing the same physical system.
There are a number of server virtualization implementations. The most well known are, in no particular order: AIX® LPARS, Solaris™ Containers (Zones), VMware, HP-UX nPars, and XEN.
While the focus of this article is server virtualization as described above, I would be remiss if I didn’t mention another emerging virtualization technology: application virtualization, which addresses application level workload, response time, and application isolation within a shared environment. A prominent example of an application virtualization technology is WebSphere Virtual Enterprise.
The primary reason most organizations implement a virtualization strategy is to improve resource utilization. It’s not uncommon for servers to be using less that 25% of CPU even at peak, which means there’s a great deal of excess capacity. Server virtualization enables the consolidation of physical multiple servers into virtual servers all running on a single physical server, improving the resource utilization while still not exceeding capacity. Additional benefits of server virtualization include savings in power, cooling, and floor space, and probably lower administrative costs as well.
The lure of improved resource utilization is what leads to pitfalls in server virtualization. More specifically, over-committing the available physical resources -- CPU and memory -- in an attempt to maximize server utilization is what leads to ineffective virtualization! In order to effectively utilize server virtualization, it’s paramount to recognize that underlying the virtual machines is a set of finite physical resources, and once the limits of these underlying resources are reached, performance can quickly degrade. While it’s important to avoid over-committing any physical resource, two resources in particular are key to effective virtualization: CPU and physical memory (RAM). As a result, it is essential to avoid over-committing these two resources. This is actually no different than in a “non-virtualized” environment or, stated another way: virtualization doesn’t provide additional resources.
I’ve actually written about this subject before, most recently in 2005, and somewhat coincidently (perhaps even prophetically) at the time I stated:
As a starting point, I plan on having at least one CPU per application server JVM; that way I have likely minimized the number of times that a context switch will occur -- at least as far as using up a time slice is concerned (although, as mentioned, there are other factors that can result in a context switch).
In testing by the WebSphere Performance Lab and VMware, it turns out that when there was a single application server JVM per virtual machine, performance degraded once the number of virtual machines exceeded the number of CPUs; in other words, performance degraded once the number of application servers exceeded the number of CPUs. While the degradation was gradual (at least initially), once the ratio of virtual machines to CPUs exceeded 1:1, performance started to degrade more rapidly. I’ll anticipate the obvious question how much does it degrade? by stating it depends. The amount of degradation is in inverse proportion to the client workload; the lighter the client workload (meaning the longer the think time between client requests), the less the amount of degradation.
If you’re contemplating a CPU over-commit scenario using virtualization, then you’ll need to test and carefully monitor response time to make sure you don’t over-commit to the point that performance degrades significantly. When testing, you’ll need to test all the virtual machines simultaneously and the workload should represent your peak workload, not your average workload. Otherwise, if several applications peak at the same time, you could encounter some very dissatisfied customers as the result of unacceptable response times. It’s likely best to limit any CPU over-commit configurations to development environments where response time is less critical and load is light.
Related to this, if you’re using VMware, ESX server CPU utilization needs to be measured using ESX, not via the OS tools inside the virtual machine. This is because VMware is abstracting the OS hardware to the virtual machine, and even with VMware tools installed in the guest OS, the only way to monitor overall system CPU utilization is to use ESX.
Avoiding over-commit of the underlying physical memory between the virtual images is likely more important than avoiding CPU over-commit. While CPU over-commit typically results, at least initially, in a gradual degradation, the performance degradation associated with memory over-commit is much more pronounced (mentally picture someone falling down!!). There are a couple of reasons this is the case. When you’re running Java™ and you over-commit on memory, either in a virtualized environment or a non-virtualized environment, the OS pages or “swaps” some portion of the memory associated with running processes to disk in order to improve the locality or “locality of reference” of the most recently used data to the CPU, and thus improves performance. Unfortunately, garbage collection in Java violates locality of reference since the purpose of garbage collection is to find and remove memory that hasn’t been recently used. In order to do so, *all* memory within the JVM heap must be examined, and as a result, the entire JVM must be in physical memory (RAM). Therefore, when garbage collection runs, any portion of the heap that isn’t in physical memory must be paged in, while the memory with some other processes must be paged out, all of which results in additional CPU and I/O load on the system in addition to the regular application workload and the garbage collection. It’s no wonder that memory over-commit has devastating performance impacts when using virtualization, though avoiding memory over-comment in a non-virtualized environment is equally important.
I recently assisted a client who was experiencing “poor WebSphere Application Server performance,” which after some discussion turned about to be caused by memory over-commit. In this case, they were able to run three application servers with no ill effects, but when they tried to start a fourth application server, performance degraded considerably. Since this was on Windows®, we used Task Manager to look at actual memory use and determined that when running three application servers, there was only about 100MB of free RAM left. As a result, starting an additional application server resulted in paging out memory associated with the other application servers, and once garbage collection tried to run in one of the application server JVMs, further paging occurred; ”thrashing” might be a more accurate description.
Again, memory over-commit can adversely impact performance in both virtualized
and non-virtualized environments. This often occurs because many don’t realize
that the process footprint of a JVM is larger than the maximum heap size. This
occurs because, aside from the JVM heap where application code executes, there’s
an interpreter associated with each JVM. The interpreter maps the Java bytecode to
the underlying OS implementation for I/O, graphics, and so on. Therefore, it’s
important to guard against memory over-commit by monitoring the actual process
footprint of your application servers using the tools appropriate for your OS,
such as Windows Task Manager or
vmstat on UNIX®
While over-commit of memory and CPU are likely the most prevalent problems that can occur when using server virtualization, they aren’t the only anti-patterns associated with this technology.
If you’re using server virtualization in your production environment and are concerned with high availability, then you need to make sure that you have distributed each application not just across multiple virtual machines, but that the virtual machines associated with a specific application are also distributed across multiple physical machines. Failure to do so results in a configuration where the physical machine is still a Single Point of Failure (SPOF). While modern hardware is incredibly reliable and fault tolerant, that doesn’t preclude a hardware failure resulting in the loss of all frames (hardware partitions) on a machine and, in turn, a total loss of application availability.
Another potential friction point when using server virtualization occurs when application virtualization is in use. Both these technologies provide for autonomic adjustment of various resources; CPU and memory in the case of server virtualization, application server instances, workload, and so on, in the case of application virtualization. In order to avoid conflicting decisions between server virtualization and application virtualization, the response cycles associated with each should be configured to steer clear of conflicts. Most often this requires lengthening cycle times or disabling some of the functions associated with each technology.
And that concludes this chapter of the WebSphere Contrarian. Based on the feedback I received at IMPACT, many of you found the initial installment of value and hopefully that will be the case this time as well.
WebSphere Virtual Enterprise
lines by Tom Alcott: Everything you always wanted to know about WebSphere Application
Server but were afraid to ask, Part 1
Using VMware ESX Server with IBM WebSphere Application Server
IBM developerWorks WebSphere
WebSphere Extended Deployment resources
Tom Alcott is consulting IT specialist for IBM in the United States. He has been a member of the Worldwide WebSphere Technical Sales Support team since its inception in 1998. In this role, he spends most of his time trying to stay one page ahead of customers in the manual. Before he started working with WebSphere, he was a systems engineer for IBM's Transarc Lab supporting TXSeries. His background includes over 20 years of application design and development on both mainframe-based and distributed systems. He has written and presented extensively on a number of WebSphere run time issues.