Introduction to Virtualization on Power
- Advanced Power Virtualization Feature can be ordered initialy or activated later.
- The HMC (Hardware Management Console) is used to create logical partitions on the systems or to change the configuration. It can (but should not) be disconnected when the system is up and running. Virtualization data is kept in the box itself, so the HMC can be replaced.
Most of the virtualization work is done in hypervisor. Hypervisor is a tiny (and very hardware near) software layer above the actual hardware and below the installed operating systems. In some sense it can be seen as extended bios. Hypervisor divides the hardware in pieces and makes it visible for logical partitions, those only see the hardware (IO slots, memory amount and cpu power) they have assigned.
On Power4 based systems the granularity was 1 CPU, 256 MB RAM and 1 IO slot. The main improvement on Power5 is, that we can assign 0.1 of the CPU and the granularity is 0.01. It is still possible to dedicate whole CPU to a partition (dedicated) but in most cases it will be better to assign CPU Power from the shared CPU pool. Shared CPU pool is build from all CPUs in the box wich are not assigned as dedicated. It is also possible to mix dedicated and shared on one box, for example to create one LPAR with 2 dedicated CPUs and create 5 other in the shared mode using the 2 CPUs left.
For example seting the capacity entitlement to 1.0 will result in the same CPU power assigned, but the LPAR will also be able to use more CPU power if needed and not used by other LPARs (logical partitions) at the moment. It means the LPAR will take advantage of the integrated load balancing and the overall system utilization will increase. Speaking about the virtualization overhead we need to point up that all Power5 benchmarks published are done with hypervisor. Even running in full system partition mode on power 5 means running one big partition and all hardware access is done through hypervisor.
Micropartitioning is realized through dividing CPU power in time slices and giving LPARs access to it. Unused cycles are given back to the pool and can be utilized by other LPARs in the pool.
IO virtualization in hypervisor is actually realized through assigning IO slot to the partition. Hypervisor does not care about what kind of adapter is plugged in, it only assigns the slot and any adapter inside. This has a great advantage - hypervisor itself does not need any hardware drivers, accesing the adapter assigned is operation system task. However it also means that if we assign SCSI or RAID adapter to some LPAR all disks connected to this adapter will belong to this LPAR. And also that on Power5 systems, using micropartitions and running up to 40 partitions on a 4-way system a huge amount of PCI-adapters will be required.
In order to reduce the amount of real physical adapters needed VIO-Server can be used. VIO server is an LPAR which has access to the disks and can share parts of it to the clients. VIO server becomes a virtual scsi server adapter assigned and the client a virtual scsi client adapter. On the VIO server we map some part of the disk (for example logical volume) to the virtual scsi server slot and we will be able to use it as a simple scsi disk on the client side.
VIO server CD is delivered with the machine if the advanced power virtualization feature is ordered. It is an appliance system which can be installed on one or two LPARs on the machine and used only for providing virtualized disk and network ressources to the clients.
SUSE SLES9 or Debian can also be used and provide virtual IO server functionality.
Mix of virtual scsi disks (for example for root) and real adapters (for example for database access) is possible and usual.
Hypervisor also provides virtual network switch functionality. Any partition can become a virtual network adapter assigned and will be able to communicate with other partitions. Hypervisor virtual switch is also VLAN cappable, so LPARs with virtual network adapters can also be separated from each other in different VLANs.
In order to get this network communication also connected to the world outside the box a bridge or router is needed. Any partition can have one real physical network card and one virtual network card assigned and act as router (for example using masquerading with iptables)
VIO server or a linux partition as a VIO server can act as a layer 2 network bridge. The virtual network adapter assigned to this partition should be marked as a trunk adapter.
--Tomas Baublys, 12 Jul 2005 (CEST)