Some Lessons On DFSORT Join
MartinPacker 11000094DH Visits (3580)
This post is just a few notes on things that might make life easier when developing a JOIN application. Specifically the one I alluded to in Happy Days Are Here Again? when I talked about processing SMF 101 (DB2 Accounting Trace) records.
And I wrote it having scratched my head for a few hours developing a JOIN application that will soon be part of our Production code.
Lesson One: Massage The Input Files In Separate Steps
Here is what “Single Step” looks like:
And this is what “Multiple Step” looks like:
The clear advantages of “Single Step” are:
But sometimes you really want to know what the intermediate records look like. In particular what positions fields end up in, what lengths they have, and what formats they appear in.
And you can always move the logic to the JOIN step as you approach Production; In fact you should. SYSIN becomes JNF1CNTL for file F1 and JNF2CNTL for file F2.
Viewing Intermediate Files While Running JOIN
and route TESTOUT DD to SYSOUT (or wherever). The SORTOUT data set can then be fed - as you originally intended - into the JOIN step.
In my case the two data sets fed into the JOIN are temporary; When the job completes they’re gone.
Lesson Two: Debug Failed Joins One Field At A Time
Zero Records Out
In my application I’m joining on multiple key fields - 8 in my case.
By the way it’s probably best not to direct the output to the SPOOL; While I was debugging this way I was sending several million lines there before I caught and purged the job.
Far More Records Than Expected Out
In my case I was using the first 22 bytes of the 24-byte Logical Unit Of Work ID (LUWID). And I was getting orders of magnitude more records out than I expected.
The final two bytes are a commit number. For some reason I thought it shouldn’t be part of the join key. I was wrong.
Extending the key to 24 bytes made the JOIN (demonstrably) behave.
Lesson Three: Careful With The Name Spaces
So for the F1 file you might prefix the symbols with “F1_” and, similarly, the symbols for the F2 file might begin with “F2_”.
Conventionally, I use “_” before the symbols that map a record after INREC. You could adapt that so the results of REFORMAT could be mapped using symbols prefixed with “_”.
In any case some sort of symbol scheme is needed.
While we’re talking about symbols, I wouldn’t attempt JOIN without them.
If you’re developing with the “Multiple Step” approach you can reuse the symbols between the reformatting and JOIN steps - because you can concatenate SYMNAMES data sets. But note this reusing the output symbols from the reformatting steps for the input to the join.
One thing you can’t do is specify different SYMNAMES DDs for the pre-processing stages in the “Single Step” case. So you have to be careful with names.
In case the above is clear as mud let me try a little example.
In F1 Step you might code:
And for the F2 Step you might code:
In the JOIN Step you might code:
Of course, in the above you’d probably put the F1_ and F2_ fields in their own symbols files - to enable reuse.
One minor annoyance with symbols files is they push you towards another ISPF session, which you could probably do without. But it is only a minor annoyance.
Lesson Four: REFORMAT Isn’t The Final Reformatting
It doesn’t. So you have to add them in an OUTREC or OUTFIL statement. A cumbersome alternative is to pass the fixed strings in as fields from the F1 or F2 streams.
One thing that is available in REFORMAT (and only from REFORMAT) Is a single-character indicator of how the record was matched. It has three potential values:
This might prove useful In debugging. You indicate you want this flag using the “?” character.
The one overarching thing in my mind is to build any DFSORT application up in simple stages, and perform optimisations later. A good example, which I’ve already shown you, is the “Multi Step” approach to building up JOIN.