Getting Nosy With Coupling Facility Engines
MartinPacker 11000094DH Visits (3021)
In my “Parallel Sysplex Performance Topics” presentation I have some slides on Coupling Facility Processor Busy. I’ve worried about including them, considering them borderline boring.1
In my head I justified them because they:
For both of those topics it’s been enough to take a “whole Coupling Facility” view, aggregating over all the Coupling Facility’s processors.
But, and not many people know this, RMF documents the picture for individual processors in SMF 74 Subtype 4. The reason this isn’t widely known is mainly the reports don’t show this level of detail.
One can speculate why this level of detail exists. My take is that it was relevant long ago when we had “Dynamic ICF Expansion”.2 This feature allowed an ICF LPAR to expand beyond the ICF pool into the GCP Pool. Performance Impacts of Using Shared ICF CPs describes this feature. (The document is from 2006 but it does describe this one feature quite well.)
There, you’d want a better picture of Coupling Facility processor busy than just summing it all up. In particular you’d want to know if the GCP engines had been used.
What Is Coupling Facility Processor Busy?
This seems like a silly question to ask, but it isn’t.
If you were to look at RMF’s Partition Data Report for the ICF pool you’d find dedicated ICF LPARs always 100% busy. And it’s not just because they’re dedicated. It’s because the Coupling Facility spins looking for work. So that’s not a useful measure - for dedicated ICF LPARs3.
So a better definition is required, and thankfully RMF provides one. There are two relevant SMF 74 Subtype 4 fields:
Using these two the definition of busy isn’t hard to fathom:
Coupling Facility Busy % = 100 * R744PBSY / (R744PBSY + R744PWAI)4
This, as I say, is normally calculated across all processors in the Coupling Facility.
By the way, you might find Coupling Facility Structure CPU Time - Initial Investigations an interesting read. It’s only 9 years old.
Processor Busy Considerations
But let’s come almost up to date.
I recently looked at a customer’s parallel sysplex and got curious about engine-level Coupling Facility Busy, so I prototyped some code to calculate it at the engine level. I not only summarised across all the RMF intervals but plotted by the individual 15-minute interval.
Here is the summarised view for one of their two 10-way z13 Coupling Facilities:
The y axis is Coupling Facility Busy for the engine, the x axis being the engine number.
So clearly there is some skew here, which I honestly didn’t expect. By the way, at the individual interval level the skew stays about the same. Indeed the same processors dominate, to the same degree.
A couple of points:
Now let’s come really up to date.
z14 introduced CFLEVEL 22. One area of change is in the way work is managed by the Coupling Facility Control Code (CFCC). In particular, processors have become more specialised. This is to improve efficiency with larger numbers of processors in a Coupling Facility.
CFLEVEL 22 introduced “Functionally specialized” ICF processors for CF images with dedicated processors defined under certain conditions:
This avoids lots of inter-processor contention previously associated with CF engine dispatching.
If there are going to be specialised engines I’d expect more skew than before. At this stage I’ve no idea whether the two specialised processors are going to be busier than the rest or less busy6. Further, I don’t know how you would manage down the CPU for either the specialised processors, nor the rest. Maybe the state of the art will evolve in this area.
Note: There’s no way of detecting a processor as belonging to one of these three categories.
So, this makes it even more interesting to examine Coupling Facility Busy at the individual engine level. I’ve not yet seen CFLEVEL 22 RMF data, but at least I have a prototype to work from, whether the customer’s data is at the CFLEVEL 22 or not.