System z10 CPU Instrumentation
MartinPacker 11000094DH Comments (2) Visits (7942)
Since I got back off vacation in L'Hérault in late August I've been working on adding z10 support to our CPU analysis code. It's quite a substantial set of changes - and I don't think I'm finished yet. But I'd like to share with you what I've learned so far.
But first let's briefly review what's changed with z10. (This is a very brief review and not a tutorial on the subjects mentioned.)
I've had data from one customer who is using Hiperdispatch for real. But already I'm seeing "behaviours".
I would assume, by the way, that MXG already has support for the new fields and has adjusted any calculations that needed adjusting. While I follow MXG-L Listserver I don't take more than a passing interest in MXG itself. And, also by the way, I'm talking exclusively about Type 70 Subtype 1 in this post.
We now have four different models in Type 70:
There are also three capacity ratings:
These are all interesting in an environment where your machine configuration changes - whether through "On-Off Capacity On Demand", "Capacity Backup", "Capacity For Planned Events" or whatever. You can now do your usual performance and capacity work even when the configuration changes.
At this point I'm just listing the numbers in my reporting. I suspect I'll do more when I get performance data from customers who actually do e.g. time-of-the-month upgrades/downgrades (and I know one or two who already do).
When looking at Hiperdispatch you have to understand there are two major parts to it:
Internally I still sometimes hear it using the terms DA and VCM. The point is it's got two parts to it. So there is information in sections of the record related to z/OS and other information in sections related to PR/SM. You have to put the two together.
And here's the most important bit...
You need to collect Type 70s from ALL z/OS images of any significance on the machine to get the full picture.
A good example of this is understanding how many logical engines are really in play when some of them are parked (in most LPARs).
z/OS - Related Information
SMF70HHF has flags for whether Hiperdispatch is supported or is active. These are, fairly obviously, for the reporting z/OS image.
SMF70PPT is the amount of time this engine was "parked" in the interval. (That is when work is deliberately not dispatched to it.) These are some or all of the "Low Polarization" engines. More on that a little later. But parked engines are important because the new calculation for CPU Busy counts parked engines as not part of the z/OS image's capacity.
PR/SM - Related Information
SMF70POW is used to calculate the Polarization Weight for a logical engine. Logical engines are classified as High, Medium or Low. An LPAR's weights are spread across its logical engines to ensure the High engines each have a weight corresponding to one physical engine. Each Low engine has a zero weight. Any weight left over from assigning the High weights is assigned to either 1 or 2 Medium engines. (1 if the remainder is more than half an engine, 2 if the remainder would have been less than half an engine.)
You can observe this Polarization Weight distribution using SMF70POW...
The highest value of SMF70POW for an LPAR is a High logical engine, that is 1 whole physical engine. Any values of SMF70POW smaller than that but greater than zero are for Medium logical engines. I've seen cases of both 1 Medium and of 2 Mediums for different LPARs on the same machine.
Bringing It All Together
So, to understand Hiperdispatch you need both LPAR and z/OS image information.
Actually, since IRD was introduced, you've had to marry up both perspectives. Because Online Time (in the case of Logical CP Management) became a part of the calculation. And now Parked Time is.
(After a number of years of owning our CPU Analysis code I've recast it - for Hiperdispatch - in a way that makes it much easier to morph our CPU calculations in case anything else happens. I'm not foretelling anything - just knowing that CPU Utilisation is one of those things whose definition will never settle for long.) :-)