MartinPacker 11000094DH Visits (5771)
I don’t think I’ve ever written very much about DDF. Now seems like a good time to start.
I say this because I’ve been working pretty intensively over the last couple of weeks on upgrading our DDF Analysis code. Hence the recent DFSORT post (DFSORT Tables).
I’m actually not the DB2 specialist in the team but, I’d claim, I know more about DB2 than many people who are. At least from a Performance perspective.
Actually this post isn’t about how to tune DDF. It’s about how to categorise and account for DDF usage. As usual I’m being curious about how DDF is used.
The Story So Far
And the DDF 101 record has additional sections of real value:
In addition a field in the standard QWAC Section (mapped by DSNDQWAC) documents the WLM Service Class the work executed in. This field (QWACWLME) is only filled in for DDF.
So I wrote a DFSORT E15 exit to reformat the record so that all the useful DDF information is in fixed positions in the record. This makes it easy to write DFSORT and ICETOOL applications against the reformatted data. (In our code this data is stored reformatted on disk, one output record per input record.)
These reports concentrated on refining our view of what applications accessed DB2 via DDF. So, for example, noticing that the vast majority of the DDF CPU was used by a JDBC application (and its identity).
I also experimented with writing a DDF trace - using the time stamps from individual 101 records. Because installations can now consolidate DDF SMF 101 records (typically to 10 commits per record) this code has issues. 
This code was good for “tourist information” but showed a lot of promise.
2015 Showed The Need For Change
There were some defects. The most notable was the fact that DB2 Version 10 widened a lot of the QWAC section fields. But also my original design of converting STCK values to Millisecond values was unhelpful, particularly when summing.
But these are minor problems compared to two big themes:
These two themes occurred in several different customer engagements in 2015, but I addressed them using custom queries.
Time Of Day Analysis
My current prototype graphs with 1-minute granularity and (less usefully) with 1-hour granularity. And there are two main kinds of graph: Class 1 CPU and Commits.
With my test data (actually to be fed back to the customer) the Commits show an interesting phenomenon: Certain DDF applications have regular bursts of work, for example on 5 minute and 15 minute cycles. Normally I don’t see RMF data with 1-minute granularity so don’t see such patterns. Bursts of work are more problematic, including for WLM, than smooth arrivals. Now at least I see it.
What follows is an hour or so’s worth of data for a single DB2 subsystem, listing the top two Correlation IDs.
Here’s the Commits picture:
As you can see there are two main application styles , each with its own rhythm. These patterns are themselves composite and could be broken out further with the QMDA Section information, right down to originating machine and application.
And here’s the corresponding CPU picture:
It shows the two applications behave quite differently, with the cycles largely absent.
This suggests the spikes are very light in CPU terms and that other e.g. JDBC applications are far more CPU-intensive.
Notice how the CPU usage peaks at 2 GCPs’ worth (120 seconds of CPU in 60 seconds).
The underlying JDBC CPU usage is about 1 GCP’s worth.
zIIP CPU is exactly what it says it is: CPU time spent executing on a zIIP.
zIIP-eligible CPU has had a chequered history but now it’s OK. It’s CPU time that was eligible to be on a zIIP but ran on a General-Purpose Processor (GCP).
The third number warrants a little explanation: With the original implementation of DDF zIIP exploitation every thread was partially zIIP-eligible and partially GCP-only. More recently DB2 was changed so a thread is either entirely zIIP-eligible or entirely GCP-only. By looking at individual 101 records you can usually see this in action.
So I added a field that indicates whether the 101 record had any zIIP CPU or not - and I count these.
Rolling up 101 records complicates this but my test data suggests the all-or-nothing works at the individual record level.
Nowadays I get the WLM policy (in ISPF TLIB form) for every engagement. So I can see how you’ve set up WLM for DDF. What’s interesting is how it actually plays out in practice.
And this is where the 101 data comes in:
It would be wonderful if SMF 101 had the ending Service Class Period but it doesn’t. But at least you can do statistical analysis, particularly to see if the work is homogenous or not.
Similar statistical analysis can help you set realistic Response Time goals.
Here are a couple of graphs I made by teaching my code how to bucket response times and Class 1 TCB times, restricting the analysis to JDBC work coming into a single DB2 subsystem: 
Here’s the response time distribution graph:
If this were a single-period Service Class one might suggest a goal of 95% in 0.5s - though if that were the case it’d be an unusually high value (0.5s).
You can also see attainment wasn’t that variable through this (75 minute) period.
And here’s the Class 1 TCB time distribution graph:
Notice how the bucket limits are lower than in the Response Time case. In my code I can fiddle with them separately.
Over 95% of the transactions complete using less than 15ms of CPU.
These two graphs aren’t wildly interesting in this case but they illustrate the sort of thing we’ll be able to see as analysts with the new code.
I think it’s a nice advance.
It’s interesting to note the approach and, largely, the code is extensible to any transaction-like access to DB2. For example CICS/DB2 transactions. “Transaction-like” because much of the analysis requires frequent cutting of 101s. Most batch doesn’t look much like this.
I’d also encourage customers who have significant DDF work and collect 101 records to consider doing similar things to those in this post. This is indeed a rich seam to mine.
And I think I just might have enough material here for a conference presentation. Certainly a little too much for a blog post.
As always, I expect to learn from early experiences with the code. And to tweak and extend the code - probably as a result of this.
So expect more posts on DDF.