Local, Nar & Far Memory part 6 - Too High a Virtual Processor number has a Bad Side Effect
nagger 100000MRSJ Comments (4) Visits (15110)
The title should read "Local, Near & Far ..." - I will not correct it or links might fail.
In this entry we carry on from part 5 but we are going to look at setting the virtual processor number for the virtual machine. There is a side effect that is not obvious and after 6 years of using them, it never occurred to me so perhaps it is news to others too. The problem of virtual processors is that they are ephemeral - i.e. they don't actually exist and costs nothing. So I find most systems administrators feel they can be generous and allocate lots of them. In this AIXpert blog - I have pointed out, many times, that it would be clearer if Virtual Processor (VP) was called the "Spreading Factor". In Uncapped virtual machines, the VP number tells the Hypervisor the number of physical processors the VM can "see" (i.e. the number of CPUs AIX knows about) and can spread out the processes across. Also in previous entries we have looked at AIX CPU Folding, where AIX decides that running more CPU cycles on fewer physical CPU is more efficient and drops hint to the Hypervisor. This makes systems administrators feel there are no bad effects of high VP numbers.
So is there any harm in having a large Virtual Processor number? The Answer is YES
Here is a little reminder from part 5 of an example virtual machines (LPAR) setup:
For illustration purposes let us take a VM which averages in busy periods 16 physical CPUs, peaks regularly for a few minutes to 18 physical CPUs and we have a shared processor pool of say 48 physical CPUs. Four scenarios:
All of these involve some compromise but there are two hidden side effects that you should know about. The too low an Entitlement side effect was covered in part 5 and the Virtual Processor number side effect covered below.
Ask yourself the question what is the difference between virtual machines with VP=48 or 36 or 20?
Well, whether you are using them or not (i.e. AIX folded) the Virtual Processors have to be allocated to a physical CPU-core in the machine - it is the Hypervisor that allocates them. At last in this series of blogs, we are back to considering virtual machine placement and the memory implications - sorry for the long detour in to Processor scheduling etc.
Lets have a look at how it could be laid-out.
If we take the example of a 64 CPU-core Power 770 in diagram terms it would look like the below picture with eight POWER7 chips and 64 CPU-cores, in the four CEC drawers (brown squares with two POWER7 chips each) and a reminder about "Local, Near and Far" memory access:
Small size Virtual Machines
So if we have a virtual machine with a VP number of 8 (or less) - then if we are lucky the entire virtual machine could be placed on a single POWER7 chip with its 8 CPU-cores. This would mean every data access would be from the local memory DIMMs.
Medium size Virtual Machines
If we have a virtual machine with a VP number of 9 to 16 then (again if lucky) it would get placed on the two POWER7 chips within a single Power770 CEC drawer. This would mean the data access would be Local or Near with AIX taking every opportunity to make the process and data on the same CPU-core and minimum Far data access.
Why do I keep saying "lucky" ?
Well, if you already have virtual machines (LPAR) running the Hypervisor may have already allocated a few CPU-cores on each of the POWER7 chips and it will have to work around then. This would mean the Hypervisor can't go for a perfect placement. Below we show four different levels of "luckiness" for virtual machines using 8 virtual processors allocated to physical CPU-cores
"Do you feel Lucky, punk!":
Although, I have painted this "Very Unlucky" lay out as bad do not lose sight of the fact that the POWER7 range have VERY HIGH-SPEED memory sub-systems and can work well across CEC drawers. Those rPerf performance numbers are based on large 64 CPU-core virtual machines, where you have to deal with Local, Near and Far memory access ALL THE TIME. It will work very well with Far memory accesses but if with a little thinking and planning we can get our smaller VM's going even faster by avoiding Far memory we might as well optimise them.
Back to the larger LPAR example
In the worst case, where
In the best case, where
we may get the maximum Local memory and minimum use of Near and Far memory.
A bad idea
If we unnecessarily set VP=64 then we get the worst case every time = the VM over the whole machine :-(
For the worst case for VP= 20, 36 or 48 virtual processors, we end up with all Power 770 CEC drawers and all POWER7 chips.
But for the best cases, the lower VP numbers do much better. See below for details:
With the lower VP numbers the VM does not have to be spread across so many Power 770 CEC drawers and thus gain efficiency - this only took a little monitoring and planning.
But wait there is more - there is the placement of memory in addition to the CPU-cores to consider - that will be in part 7