This document outlines extensive testing completed by IBM Cognos Planning R&D and Product Management using a Virtual Environment. It is intended to provide guidance when preparing a IBM Cognos 8 Planning environment for use in virtual environments.
All tests were performed against IBM Cognos 8 Planning version 8.4 FP2.
Exclusions and Exceptions
As with all Planning deployments model characteristics can affect performance. In the tests performed below a highly complex, and large model currently in production was used to simulate real-life experiences.
Test Conditions and Facts
All tests carried out on latest released versions of products.
- VMWare vSphere 4 Update 1
- IBM Cognos 8 Planning 8.4 FP 2
- 5 HP ProLiant BL460c G1 servers with Intel E5450 3 GHz processors
- 2 HP ProLiant BL460c G6 servers with Intel Nehalem X5560 2.8 GHz processors
Where data is compared across machine sizes the following RAM was assigned depending on the number of processors.
- 8 procs 13 GB RAM
- 4 procs 6.5 GB RAM
- 2 procs 4 GB
All infrastructure (DB Server, C8 Server, Admin and Web Server) is virtualized on a single host (G1 server) and has remained static.
- Web Server is IBM HTTP Server 6.1
- Database Server is Microsoft SQL Server 2005
- Operating System is Microsoft Windows 2003 Enterprise
- Boot.ini has been used to control the available resources on the physical machines and hosts
- Storage was on EMC FC SAN. 14 disks - RAID 10
- The focus of this testing centered around the Job Servers. Job Servers are typically where the most concern with performance is directed.
- A process was created that takes approximately 1.5 hours to run. This process was used to capture data and compare statistics.
- The process includes Synchronizing the Analyst Model, GTP'ing the application, executing Administration Links, running Reconciles and Publishes.
- VMWare host and Virtual Machine performance statistics were also captured.
- The focus started with E5450 CPUs then moved to more modern Nehalem based X5560 CPUs.
- The controlled variables were number of Processors in Virtual Machine and Maximum Concurrent Jobs maintained with the Contributor Administration Console.
Finding the Optimum VM Size For “Classic” Chipsets
When a hosting Virtual Machines on non-Nehalem based chipsets the following was noted:
- VMWare Hypervisor needs some headroom in order to handle resources to VMs as there is some VM only overhead.
- If the CPU capacity of the host machine is overloaded performance will rapidly deteriorate. Therefore a tuning point in the range 85 – 95% maximum CPU usage is recommended.
- An administrator can use Max Concurrent Jobs in the Contributor Administration Console to control utilization of the processors tasked to complete jobs.
- It is important to note if Planning Jobs Servers are hosted along side other systems, ensure processing and memory use is monitored. It may be necessary to limit CPU use on the Virtual Machines in order to allocate the necessary resources to complete the Planning Job. Again the 85 – 95% range for Max CPU usage on the host machine during peak planning workload serves as a good guide.
- Different jobs place differing demands on hosts for example
- Administration Links are memory intensive
- Reconciles are CPU intensive
- This information provided a good basis to defining clusters for different container types. This will vary implementation by implementation depending on the workload characteristics, so thorough testing is required to measure the optimal configuration.
- For example different configurations for Planning Store, Applications and Publish Containers.
4E5450 (“Classic”) Processors – Planning Job Performance
This process began by trying to discover the most beneficial server sizing for a Planning Job server. Testing was completed against a bench mark server that resided in a physical versus a virtual environment therefore the relative performance was compared to an 8 processor physical server across the entire process.
Table 1. Comparison of performance on virtual servers
|Virtual Machine CPU count||Max Concurrent Jobs setting||Performance degradation of VM on 5450 CPU compared to physical server||Notes|
|8||7||-74%||Monitoring showed that when running 8 job threads on a 8 core VM there was not enough CPU on the host to service the Hypervisor and performance degrades.|
|4||4||-41%||The results of 4 CPU VMs represent an average of varied configurations with other size VMs on the host.|
|2||2||-23%||The results of 2 CPU VMs represent an average of varied configurations with other size VMs on the host.|
- The best performance is achieved by preparing smaller Virtual Machines.
- The overall best compromise is to run 4 processor Virtual Machines. This provides the best balance between performance and maintenance costs. Therefore the testing exercise focussed on 4 processor Virtual Machines.
- On an 8 processor host, the best performance was observed with two two processor VMs when all processors are used for jobs.
- If running 8 processor machines on a host with 8 Max Concurrent Jobs the VM hosts CPU was overloaded. It was necessary to limit Job processing to 7 processors in order to leave enough CPU for HyperVisor.
- The conclusion is the best performance was seen with smaller VMs running as Job Servers. Running smaller VMs allows the Hypervisor to have CPU cycles where as running 1 large 8 CPU job server consumed all the host CPU leaving no room for Hypervisor.
Performance by Job Type
- Administration Links and Reconciles Benefit from giving more headroom inside the VM.
- Cut Down Model performance Benefits from smaller Virtual Machines.
- A Publish Job will benefits from having all resources available, this is due to the fact not as much stress is placed on the host systems.
Graph highlighting the time to process different jobs on virtual vs physical cpu's
Testing the Intel Xeon X5560 Nehalem based CPU
- Intel Xeon™ 5500-series processors incorporates a memory management unit (MMU) virtualization called Extended Page Tables™ (EPT).
- VMWare vSphere 4 adopts these technologies
- The processors also support Simultaneous Multi-Threading (Hyper-threading)
- Planning normally recommended Hyper-Threading is turned off on Physical Machines, however switching it on for VM hosts allows us to run bigger workloads, since Hypervisor can utilize the threads.
- A VMWare White Paper can be found on this technology - http://www.vmware.com/pdf/Perf_ESX_Intel-EPT-eval.pdf
- AMD has a similar technology called RVI
Testing results of the X5560 Nehalem CPU
- Once again the best performance is achieved using smaller VMs
- The overall best compromise is to run 4 processor VMs
- The helps provide the best balance between performance and maintenance costs
- It is recommended to ensure Hyper-Threading is enabled on the host
- There is no need to leave headroom for the VM when using Nehalem based processors
Graph comparing the combined performance of Vitual vs Physical CPU's for 4 cores
- This discovery indicates performance of two 4 CPU Vms running on a 5560 host is faster than an 8 job thread physical 5560 job server, and easily surpasses an 8 thread physical E5450 (“classic”) chipset job server.
Expanded testing with the Intel Xeon 5560 Nehalem Chipset
- Attempts were made to overload the physical cores on the host to see if IBM Cognos Planning could make full use of the Hyper-Threading available on the host. An example of this is to run four 4 core VMs on a host which only had 8 physical cores.
- With 4 * 4 processor VMs on a single X5560 based host server, you can see the results when compared against a physical server of the same specification with Hyper-threading disabled.
Workload comparison of a 8 core Xeon Virtual CPU vs a Physical CPU
- This was considered a fair user test since we cannot scale to utilize the 16 procs if HT was enabled on physical. Therefore the recommendation to utilize that same physical machine would be to disable Hyper-threading for deployment.
- Over the whole job the Virtualized Intel Xeon x5560 did 59.8% of the work.
- This represents a gain of 50% over the physical machine.
- For example a user who is running the X5560 (Nehalem) chips and wishes to virtualize their environment would only need 66% of their server capacity to meet equal levels of performance.
- Given that X5560 are approximately 20% more scalable than the E5450 chips in a physical test, users moving from a “Classic” physical environment to X5560 based Virtual would need just 56% of their original server capacity (assuming that E5450 is an average server chip).
The benefit of EPT for MMU Virtualization
- When EPT was disabled performance dropped by approximately 33% on the same tests.
- This left a 20% benefit over the “Classic” chips under virtualization, which is approximately the same performance differential when the “Classic” and Nehalem based boxes were compared in a physical configuration.
Virtual Machine Properties showing CPU/MMU Virtualization settings Use Intel VT-x/AMD-v for instruction set virtualization and software for MMU virtualization.
Planning Administration Server Results
- The Planning Administration Server was run within a Virtual machine hosted on a E5450 (“classic”) throughout the Job Server testing and no problems were observed.
- Again utilizing the X5560 Nehalem processors provided the best results when virtualized.
Comparison of Admin Task time using Virtual vs Physical CPU's
Planning Web Server Results
- A “Bull Rush” scalability test of 100 users was performed to simulate web client access under extreme simultaneous load.
- Once again it was discovered the Intel Xeon X5560 Nehalem processors perform much better under Virtualization.
Bull Rush test response times using Virtual vs Physical CPU's
- Virtualization is a good option for utilizing larger machines.
- Virtualization with the X5500 Nehalem series chips can significantly reduce the amount of hardware required with strong ROI and Energy savings.
- Benefit from all the other advantages of Virtualization are achieved, such as lower cost of ownership and maintenance costs.
- The same recommendations can be safely made across Administration Servers and Web Servers.
Appendix - VMWare Feature Test
- Distributed Resource Scheduling
- Planning Job Servers were tested and worked well.
- When utilization limits were reached – host migration took place.
- In-Flight planning jobs continued with only a negligible performance impact.
- High Availability
- Host switching in case of hardware failure.
- The environment continues to work as expected.
- In Flight planning jobs fail if a host is disabled which is expected.