Batch Architecture, Part Two
MartinPacker 11000094DH Visits (4022)
I concluded Batch Architecture - Part One with a brief mention of inter-relationships and data. I'd like to expand on that on this part.
Often the inter-relationships between applications are data driven - which is why I'm linking the two in this post (and in my thinking). But let's think about the inter-relationships that matter. There are four levels:
(And a minor note on terminology: Yes I KNOW that OPEN and CLOSE are macros. I don't intend to use the capitalisation here - because the act of opening and closing a data set is meaningful, too (and less grating to read). Forgive me if this "sloppiness" offends.)
Life Of A Data Set (LOADS)
I won't claim to have invented this technique. (As I said in "Memories of Hiperbatch" I declined an offer to write up a patent application because I knew I hadn't originated it.) But I do advocate its use quite a bit. Here's an (oft-used) example:
If you have a single-step job that writes a sequential data set and another that reads it (both from start to finish) there's a characteristic data set "signature": Two opens, one after the other, one for update, one for read. If you discern this pattern you might think "BatchPipes/MVS". (Depending on other factors you might think other things - such as VIO.)
So this is a powerful technique.
LOADS Of Dependencies
In 1993 we wrote code to list the life of each data set a job opened and closed. Not long after that we got tired of figuring out dependencies by hand from LOADS. So we fixed it:
At its simplest a writer followed by a writer indicates a ("WW") dependency. A writer followed by a reader indicates a ("WR") dependency, also. And so on.
Pragmatically some of these dependencies aren't real, or at least it isn't as simple as this sounds. For example:
As I said before application-level, job-level (in some ways the same thing) and step-level dependencies are things we've known about for a long time. Also we've know about DFSORT (and other sort) phases for a long time, too: Input, Intermediate-Merge and Output phases. These should be familiar, although people tend to forget about the possibility of an intermediate merge phase - because it should only apply to large sorts.
So, if sorts have phases, what about other steps? Last year I enhanced the code to create Gantt charts for data set opens and closes within a step. In many cases jobs became no more interesting because of it. But in a number of cases fine structure appeared: Non-sort steps demonstrably had phases. In one example a step that read a pair of data sets in parallel wrote to a succession of output data sets. I could see this from the open and close timestamps of the output data sets. (Without looking at the source code I couldn't be sure but maybe there's some mileage in dissecting this step.)
It's in my code: If it applies to your jobs I'll be sure to tell you about it.
An Application And Its Data
Apart from the small matter of scale figuring out which data an application uses is the same problem as figuring out which data a job uses.
I think I'll talk about DB2 in a later post, as this one has already become lengthy.
As you probably know there is lots of instrumentation on data sets in SMF. Without going into a lot of repetitive description: