Time to correct a few assumptions and statements that I made a while back.
Previously I said the POWER Hypervisor decides where to put a Virtual Machine (VM/LPAR) based on the Virtual Processor number (spreading factor). Well, apart from it nearly being right ... I was actually wrong! I got talking to one of these very impressive Hypervisor developers in Germany and he put me right. In most of my larger machines, I do what I think is fairly normal for creating an VM - like working with a CPU core to RAM ratio that my customers use, like 1 core to 8 GB or higher. If you do it then it looks like the layout of the VM is VP based. My Hypervisor guru Pete H (name removed to protect his identity :-) as very knowledgeable, patient and always right.
But if you try something strange like Entitlement=2, Virtual Processors=10 CPU cores to 2 GB of RAM then you can get the lssrad output to look like this -
which I previously, thought was a bug in the lssrad output!.
# lssrad -av
REF1 SRAD MEM CPU
0
0 1712.38 0-39
#
So we get 1.7 GB of memory on REF1 (in my Power 770 this is a CEC drawer) and POWER7 chip zero - one which I am allocated logical 40 CPUs (0 to 39) with SMT=4 this is 10 POWER7 cores as the Virtual processor number is 10. Of course there are only 8 CPU cores on a POWER7 chips and sometimes less (on the 4 core and 6 core POWER7 models). So this VM layout looks all wrong.
So here is the Hypervisor thinking:
The Hypervisor starts with the 2 GB of memory and says wow that is small, I can easily find that within the memory attached to one POWER7 chip and then I will make all the CPU cores of the VM have a home on that one CPU core with the memory. When it comes to running workload it has to see what else is running on those CPU cores on that POWER7 chips with the memory.The memory can't be moved (not without running advanced functions like Active System Optimiser (now called Dynamic Systems Optimiser) or the new Dynamic Platform Optimiser). If the Entitlement is low there will be other VM's allocated core time and running on the same POWER7 8 cores - so obviously as the number of processes running goes up it will run out of cores on this POWER7. The Hypervisor will look at the System Resource Affinity Domains to determine the best fit to get the process, core and memory as close to each other and avoid Near and Far memory access.
It is this packing of small Entitlement VM's into POWER7 chips that can cause some problems when you know full well that the VM is normally going to use much more CPU core time. In the case where you are supporting dozens to hundreds of VM's on a machine you may be forced to use low Entitlement as you know many of the VM's will not be heavily used concurrently. But in the case where you allocate Entitlement of -.3 and have Virtual Processor at 3 and know the VM will be running and 2 CPU cores most of the time or at least during the Dolly Parton peaks then the VM will have to inefficiently compete for CPU core time above Entitlement and will be very likely to have to execute off the home CPU core and so away from the memory.
If we change the VM to Entitlement=9, Virtual Processors=10 CPU cores to 2 GB of RAM then we get this:
# lssrad -av
REF1 SRAD MEM CPU
0
0 1480.81 0-31 36-39
1 231.56 32-35
#
So here is the Hypervisor thinking:
While it could easily get the memory local to one POWER7 it has a further complication because the Entitlement suggests that it is very often going to need more than one POWER7 chip (the Entitmentment does not fit on one POWER7 8 core chip) so it places the bulk of the memory on one chip with the first 8 cores worth of Entitlement but then some memory and one core on a second POWER7 chip which is right next to it (smart move as then will be Near memory) - then it needs to place the extra Virtual Processor some where. It does not really know if this will get used much nor where it will need to run that virtual processor until the VM gets busy, so it just gets added to the first POWER7 chip. A sort of "park it here" until at run time we will select the actual CPU core that is best.
I will get my new friendly Hypervisor guru to check this blog - if I got it wrong (again) I will update this blog ... in red.
- Update: Pete says I got this right :-)
This blog entry is called "Local, Near, Far part 12" as it follow on from an 11 part series from August to October 2011 - they are still available.
Marcações: 
power7
affinity
cores
chips
lssrad
core
cpu
aix
srad