AIX Virtual Processor Folding is Misunderstood
nagger 100000MRSJ Comments (8) Visits (49773)
This mysterious AIX CPU Folding area is often misunderstood, so below is what I know from osmosis from talking to various guru level developers over the last 10 years. Shared Processor virtual machines (LPARs for the old fashioned) have a setting called Virtual Processors (or VP for short). This is the number of physical CPUs that the virtual machines can spread out across - in fact, I prefer to call it the "spreading factor" as it is much more obvious what it means. This can be the upper threshold for the number of CPUs that can be used by your virtual machine:
But AIX does not automatically spread out over all the available virtual processors, if it does not have too, as that is not efficient. If an Uncapped virtual machine has, for example, Entitlement of 8 CPUs and Virtual Processor count of 10 CPUs but only needs at the moment 2.5 CPUs to easily provide CPU time to all the running processes/program then it "folds" away the unneeded 7 virtual processors and runs on just 3.
Why fold? I am not a AIX kernel developer (well not now anyway) nor a spokes person for development team but there is not a lot of information about folding. I think this is because it is a AIX kernel internal optimisation and so there is no need to make it public nor document it. I first noticed folding taking place while developing nmon. On the Power 5, 6 and 7 machines, we could clearly see that AIX would stop scheduling work to some virtual processors when they were not needed and thought "these developers are very clever" and was impressed. There are multiple reasons to fold:
Detecting folding: We also noted that these folded virtual processors do actually get some CPU cycles. nmon used the libperfstat AIX library to extract the performance numbers and we found that instead of billions of clock ticks on the processor these folded ones have a few - in the 50 to 100 ticks sort of range. We think this is a tiny amount of house keeping going on like collecting the processor hardware stats, perhaps a clock interrupt and I think that if a processor starts a device driver to action an adapter to perform an operation that the later interrupt is returned to the same processor (which might now be folded). This is a little guess work on my part so don't quote me - please! Back to nmon: there is no official AIX folding statistics from any interface that I have come across. nmon deduces the folding number by monitoring the physical clock ticks and heuristically (I like that words) determines if it is folded or not. In C programming terms, are the number of ticks below a threshold we worked out empirically (another good word) by sitting there and watching the numbers while tweaking the numbers of processes running.
Now for the complicated bit - SMT! When a virtual processor is folded all the Simultaneous MultiThreading (SMT) threads are switched off together - the different threads are always running in the same virtual machine. This made the heuristics in the nmon code tricky as it has to check all the threads on the same CPU are doing practically nothing. This was written when Power6 was the current processor and SMT=2 was the maximum. Of course, when Power7 came along with SMT=4 and the code had to be reworked - in fact there is a nmon release that runs on Power7 where the code had not been updated and gets the folding count hopelessly wrong but its fairly obvious when you see it (you can't fold away more CPUs than you actually have!) - install a service pack to fix this.
What does AIX actually do when it folds? We have seen that the virtual processors are still there and seen by AIX and occasionally running a few clock cycles. So they are not stopped or released. AIX is simply no longer scheduling processes to run on them. This gives the hypervisor a hint that at the moment it does not have to schedule the virtual machine on them. The AIX kernel to Hypervisor interface and mechanisms are definitely a secret, so I have no clues at this level.
When to fold? AIX does this slowly but seems to monitor the CPU use for a period of time (from my observations a few seconds which is a long long time in CPU terms) and determines that the CPU use is steady or dropping and its is safe and efficient to fold a CPU off. It then waits to see what happens next and may then fold away a further CPU. It is maths that larger virtual machines with many 100's or 1000's of active processes and threads of execution do not have sudden jumps of CPU requirements but the workloads sort of "flows in and out" and CPU use is smoothed out.
When to unfold? AIX again monitors (it is probably the same algorithm) the CPU use. When it notices a consistent high use of the current unfolded CPUs it decides that unfolding could help the throughput of the processes and unfolds a CPU. If there is a sudden peak in runnable processes it does not immediately unfold. This is because it can already deal successfully with short term transitory peaks as normal via the run queues. If fact, over aggressive unfolding could slow the applications down. When a CPU is unfolded its caches will be empty then a happy process on the running CPUs with hot cache is moved to the new unfolded one with cold caches - it will spend the next few milliseconds loading the cache with program, data, stack and heap memory lines before it can get back to full speed. If we then find the temporary peak has passed, AIX will fold the CPU again - it was all rather a waste of time and the process moved twice actually got slower. This is why it holds off a little time before unfolding and makes sure it is a genuine growth in the demand for CPU time. It gives me the impression that, for slowly growing workloads, AIX unfolds at something like a virtual processors once a second but that is an observation rather than a fact.
Why are we monitoring Folding? Well, I was keen to have this in nmon because it gives us good clues about whether we have the right Entitlement for our virtual machine. If (particularly monitoring long term like over a month), we always find that we have Folded virtual processors and particularly during our peaks in workload then perhaps we have the Entitlement set too high. It could be dropped to let other workloads be added to the same machine. On the other hand, if we have a critical production virtual machines that has for long peak periods Folding = zero then perhaps we should consider raising the Entitlement to make sure this virtual machine always has sufficient CPU cycles.
Folding is Leaking out! This advanced optimisation technique inside AIX was an internal secret at one time but has become known. I suspect my nmon monitoring might have accelerates that a little :-) But there are a few comments in the manual now and there are a few Folding tuning parameters available.
The advanced AIX scheduling tuning command, "schedo" has these options:
In the manual pages:
I hope this helps, thanks, Nigel Griffiths.