Once Upon A Restart
MartinPacker 11000094DH Visits (4814)
If you have a large mainframe estate it can be difficult to keep track of when the various moving parts start and stop. For example, if you’re a Performance person it’s quite likely nobody bothered to tell you when the systems were IPL’ed. You might well know what the regime for starting and stopping CICS is but I wouldn’t.
As you know I’m curious as to how customers run their installations and starting (and stopping) pieces of infrastructure interests me. I’m also impressed when a piece of infrastructure has been up for years - as sometimes happens. Up until now it’s been a matter of folklore such as “the installation that didn’t take an application down for 10 years”.
But I’ve turned my attention to when z/OS is IPL’ed and when key address spaces start and stop. I’m sharing the technique in case it’s something you want to do.
I’m also interested in the sequence and timing between a z/OS system’s IPL and when important subsystems are up.
I’m not going to pretend to be an expert in how systems are restarted or recovered but I am going to take an interest. Knowing what’s “normal” is, I think, useful.
You probably know that SMF 30 subtypes 4 and 5 describe steps and jobs, respectively. You probably also know SMF 30 subtypes 2 and 3 are interval records.
If you’re already collecting these you’re in good shape as Reader Start Time is in all of these. It’s all you need to figure out when stuff starts.
I prefer the interval records as
Summarisation And Reporting
For some address space types I report each job name separately. CICS regions are a good example of this. For others I pick the first one for a subsystem. DB2 and MQ subsystems are a good example of this.
To detect an IPL I choose the address space whose program is IEEMB860. In principle the job name could vary. And yes I know that “pressing the button” on IPL invokes NIP etc before this (the Master Scheduler) address space starts up.
I only print date, hour and minute for Reader Start Time. It goes to hundredths of seconds but I’m not interested in that level of detail.
In my report I sequence by timestamp. That makes it easier to see when an IPL is followed by, say, a DB2 start and then some CICS regions. I could probably create a useful Gantt chart from this, but today I don’t. The technology’s there to make this easy to do.
Looking at this data gives me a much better idea how installations manage the lifecycles of their address spaces. If I talk to you about this topic it’ll probably be from this data and I might well refer you to this blog post. This is also one of the topics in the 2014 revision of my “Life And Times Of An Address Space” presentation.
Two final points: