Memories of DFSORT OUTFIL
MartinPacker 11000094DH Comments (2) Visits (13750)
In September 1997 DFSORT Release 13 was shipped (to coincide with the release of OS/390 Release 4). It took a nice idea from Syncsort and extended it.
In case you didn't know OUTFIL allows you to read an input data set (and perhaps sort it) and write to multiple output files from the resulting records - perhaps selecting subsets of the records and reformatting them (and differently to each output file). All in a single pass over the input data.
Perhaps people really still don't know about OUTFIL as, while I get many searches that hit my blog for DFSORT topics, OUTFIL is rarely one of the search terms.
There are three features that were then unique to DFSORT:
So those are the bare-bones additions to OUTFIL for Release 13. (DFSORT has since added a lot more function to OUTFIL and I would have to assume Syncsort had done the same.)
I had the chance to run a residency that Summer in Poughkeepsie - and it was a lot of fun. A small team of highly skilled people I'm pleased to call friends and I took a version of our batch-driven SMF analysis code and made a mini batch window out of it. (We do have our own mini scheduler and a batch window topology to play with.) So we got to play with DFSORT Release 13 and the (also new) DFSMS System-Managed Buffering and WLM-Managed JES2 initiators and BatchPipes/MVS and had a fine old time of it - running and measuring and tweaking and doing it all again. We got to do a few presentations based on the results of our playing but never got to turn the foils into a Redbook. (I think we had too much fun "playing sysprog" and the like.) :-)
One feature that proved really useful was DFSORT's ability to detect when BSAM or QSAM was needed instead of EXCP. The most common cases are BatchPipes/MVS pipes, Extended Format Sequential (Striped and/or Compressed) and SYSOUT data sets.
Note: I didn't say "QSAM". So that ruled Hiperbatch out. To use Hiperbatch you have to write your own E15, E32 or E35 exit to close the data set and to reopen it for QSAM. (There was a sample piece of Assembler code to do this in the HBAID manual.)
From my (largely Performance) perspective, though, OUTFIL is all about avoiding repeated reads of the input data set. Our batch window did a fair amount of that because it read SMF data repeatedly. So OUTFIL fitted nicely into our window. (And if we pretended the output data was VB and not VBS we could get away with piping it as well.)
One thing to be clear about - which I soon realised - was that OUTFIL does not replace multiple sorts with a single one - unless the sort keys etc are identical. But you could feed the same records from a DFSORT OUTFIL job through multiple pipes into the appropriate number of DFSORT SORT jobs. n sorts become n+1 jobs. More balls to keep in the air, of course. And if those sorts are big enough they'll compete for memory. Which is where another Release 13 feature came in handy: Dynamic Hipersorting. This changed DFSORT Hipersorting from asking MVS (via STGTEST SYSEVENT) how much storage it could have once at the beginning of the sort for sort work to asking the question several times over (as the data was read in). Because of this change Dynamic Hipersorting was much less likely to cause overcommitment of memory - in the multiple concurrent sorts case.
So, to me, DFSORT OUTFIL is yet another of those techniques that you have to "engineer in". Unless you plan the implementation with the usual diligence nothing will happen. We had a lot of fun finding cases where it would be ideal in customer workloads - when conducting PMIO Batch Window studies. And, like so many other techniques, it's as valid today as it was 10 years ago.
One final thing: I've just remembered that I had another idea that got put into DFSORT Release 13...
The PMIO team worked with DFSORT Development to enhance the SMF 16 record (one per DFSORT invocation) to add a few minor fields - in Release 12. (I even forget what they were.) In Release 13 the record was enhanced to contain input and output data set sections - one per data set. These were very detailed - including the number of records read or written, the access method (including Pipes) and so on. Very nice. To get this additional data you need to run with SMF=FULL. (I'd recommend this anyway.)