Lost For Words With DDF
MartinPacker 11000094DH Visits (4501)
I'm lost for words with DDF, I really am.
“What's up with him?” my one reader asks.
So let me explain…
I debuted a presentation last year called “More Fun With DDF”. But I've made progress since then.
So what do I add to the front of this title? “Still”? “Yet”? “Even”?
I don't even think there's a hierarchy to these so it's a one shot deal tacking one of these on the front for 2017. “Even More Fun With DDF” is my favourite of these.
So let's get to the meat of it: Something you might actually want to know…
DB2 Calling DB2
So I've been involved in a couple of situations where one DB2 on z/OS calls another, using DDF, recently.
The Client DB2 might call the Server DB2 on behalf of anything - such as CICS transactions, Batch Jobs, or even its own DDF clients1.
For the rest of this post refer to this diagram, summarising key aspects of the SMF 101 (DB2 Accounting Trace) records.
Detecting Client And Server DB2 Subsystems
So how do we detect Client and Server situations?
Firstly the presence of a QLAC section in a SMF 101 (DB2 Accounting Trace) record tells you the 101 represents something participating in DDF - whichever role the DB2 is playing.
Secondly field QLACSQLS tells you this unit of work sent SQL requests somewhere - so it's acting as a Client. Similarly field QLACSQLR tells you it received SQL statements - so its acting as a Server.2
Matching DDF 101 Records
So, if I know that one DB2 is calling another I want SMF 101 (DB2 Accounting Trace) records from both DB2 subsystems. That should help me understand the conversation more fully. I will call these the Client 101 and Server 101 records, respectively.
But how do you match them up?
It turns out that timestamps are useless for this. But Logical Unit Of Work IDs are ideal - well the first 22 bytes of the 24. This is fields QWHSNID, QWHSLUNM, and QWHSLUUV concatenated.
Match these up and you're in business.3
Doing The Matching
I have code that reformats DDF 101s into records with important fields in fixed positions. With this code:
Actually I separate Batch, also CICS, also Other DDF joined records into their own data sets. For Batch “blow by blow” is appropriate; For CICS a statistical approach is better. So I have two CSV files, ripe for importing into a spreadsheet, for each of these.
Timings (and perhaps names) are the payoff for matching up these records.
The first thing to note is that normal (non-DDF) timings apply - in the QWAC and QWAX sections.
That's almost all you need to look at for the Server 101 record. Similarly, for the Client 101 record, the standard time buckets apply.
But there is a field - QWAXOTSE - that documents time waiting for the other DB2.4 It works both ways. And when its value is not explained by the 101's time buckets it can indicate communication problems.
Another piece of timing information is the end timestamps - the SMF record cutting time. What I've observed for Batch DDF is that the Server cuts its record a few minutes after the Client. My guess is this is because the Server realises the Client isn't coming back anymore; Some sort of idle timeout. I further suppose the QWACRINV field - the reason for invoking accounting - might provide the explanation But I really need more experience with this. I haven't seen the same effect with CICS DDF transactions, but then the overall numbers are much smaller.
It is perfectly possible to match up Client and Server DDF 101 records; Its value lies in getting a more complete view of such a DB2-to-DB2 conversation, complete with some extra diagnostic capability.
For example, knowing that a Batch DDF step's time is dominated by Synchronous Read I/O Wait in a specific different DB2 subsystem is useful. Or that QWAXOTSE dominates, unaccountably.
So this code is in Production and working fine.
As always, I expect my understanding to grow and the code to get refined. Both things tend to happen with more customer situations and data. You can be sure I'll relate any significant learning points here.