Engineering - Part Zero
MartinPacker 11000094DH Visits (750)
I’m writing this on a plane, heading to Copenhagen. Planes, like weekends, give me time to think. Or something.
Ardent followers of this blog will probably wonder why there have been few “original content” posts to this blog1 recently.
Well, I’ve been working on an exciting project with my friend and colleague Anna Shugol. Now is the time to begin to reveal what we’ve been working on. We call this project “Engine-ering”2.
The idea is simple: There is real merit in examining CPU at the individual processor level, for example the individual zIIP. As one colloquial term for processor is “engine” it’s easy to end up with a title such as “Engine-ering” and the hashtag #EngineeringWorks is way too tempting not to deploy.
The project has three parts:
These three are intertwined, of course. As we go on we will:
You’d expect nothing less from us.
Traditional CPU Analysis
Traditionally, CPU has been looked at from a number of perspectives:
All of these have tremendous merit - and I’ve worked with them extensively over the years.
z/OS Engine Level
Our idea is that there is merit in diving below the LPAR level, even below the processor pool level. So we would want to, for example, examine the zIIP picture for an LPAR. But we wouldn’t want to just look at in in aggregate. We want to see individual processors. There are at least a couple of reasons:
RMF (SMF 70-1) reports individual engines at two levels:
The trick is marrying these two perspectives together. Fortunately, a few years ago, I realised I could use the partition number of the reporting system and match it to the partition number of one of the LPARs. That does the trick.
In the past week I wrote some code to pump out engine level statistics for the reporting LPAR:
The first two are from the PR/SM view. The third is from the z/OS view. Which makes sense.
In any case I have some pretty graphs. And I got to swear at Excel a lot.3
SMF 113 Hardware Counters
This one is more Anna’s province than mine. But, processing SMF 113-1 records at the individual engine level, we now can see Individual engine behaviours in the following areas:
Those of you who know SMF 113 know there are many more counters. We intend to extend our code to look at those soon.
SMF 99-12 And -14
Another area we intend to extend our code to analyse is SMF 99 subtypes 12 and 14. This data will tell us how logical engines relate to physical engines, right down to which drawer they’re in, which cluster (or node for z13), even which chip. All of this can help with understanding the “why” of what SMF 113 is telling us.
You can play a similar RMF-level game for coupling facilities. Normally, you wouldn’t expect much skew between CF engines. But in Getting Nosy With Coupling Facility Engines I showed this wasn’t always the case.
I would say that, while the “don’t run your coupling facility CPU more than 50% busy” rule is sensible you might want to adjust it for any skew your coupling facilities are exhibiting.
We presented this material the other day to the zCMPA working group of GSE UK. This was to a small number of sophisticated customers, most of whom I’ve known for many years. It’s become a bit of a tradition to present an “alpha” version of the presentation.4
This post roughly follows the structure of the presentation. In this presentation we have some very pretty graphs.
Anna coined the term “research project”. I like it a lot.5 In any case, the code is a permanent part of our kitbag. If you send me data, expect me to ask for this new stuff and to use it in conversations with you. I think you’ll enjoy it.
We think the presentation went very well, with some nice discussion from the participants. Partly because of that, but not really, we intend to keep capturing hills with the code, gaining experience with customers, and evolving the presentation. Every so often I’ll highlight bits of it here. Stay tuned!