Unusual Sort Fields
MartinPacker 11000094DH Visits (2572)
While working through a scenario in our residency it became (briefly) important to be able to preserve sort order on a field. But this field wasn’t sorted in any recognisable way. So the records couldn’t be sorted alphabetically or numerically. In fact they had to be sorted so that this field was preserved in the following sequence:
This post talks about two methods of maintaining this sequence.
The trick is to create an additional numerical field on which to sort - and then to throw it away. Use CHANGE to create it with coding like…
This can be used in INREC, OUTREC or OUTFIL. For one-pass sorting purposes INREC would be the place to do it. (But the syntax is OK elsewhere.) And you’d probably want to throw away the field in OUTREC or OUTFIL OUTREC.
Obviously you specify this temporary (4-byte numeric) field on the SORT statement.
The disadvantage of this approach is the table is hardcoded into the DFSORT invocation. You might not like that.
Notice the NOMATCH value of 0. This makes unmatched records collate to the front. You might use OUTFIL INCLUDE to move them to a side file you check for emptiness. (Use OUTFIL SAVE for the rest to send them to the normal output file.)
Notice also the “NOTSEEN” value which collates last. Actually it doesn’t matter where it collates and no input record has that value in the field. The purpose of the “NOTSEEN” line is to make sure the closing bracket isn’t on any real lookup lines. So you could code lines up to the CHANGE inline and the lines from NOTSEEN onwards inline. The lines between are the real lookup table and could be in a data set. Something like
This, I think, is reasonably maintainable.
(You might be able to think of another way to keep the closing bracket on a separate line. If so please let me know.)
Using ICETOOL JOINKEYS
If you don’t want to maintain the collation table in the DFSORT invocation you can keep it in a file and use ICETOOL JOINKEYS.
To do the sort would require an additional pass over the data - with the SORT statement on the looked up field in. For smallish amounts of data that’s probably fine. But for larger amounts of data you’ll probably want to use the CHANGE method and live with the maintenance of the table in the DFSORT invocation.
Maintaining sort order on a non-standard collating key like this looks important for when you are splitting jobs up to run against subsets of the data and want to bring things back together.
Our case creates a report sequenced in part on this nonstandardly sorted field. The first thing we do - to prepare for cloning - is separate the reporting from the data analysis and update. We use a transient file. When we clone we have multiple transient files and we need to merge them somehow. So maintaining sequence on this (actually the third) key is important:
Without forcing ourselves to define the clones as processing ranges of this field’s value we can’t just concatenate these transient files: We have to keep the order of this field preserved.
Extending The Idea
Though this isn’t relevant to the residency’s purpose - teaching people how to clone batch jobs - there is a nice extension to the idea of sorting using a lookup table.
With DFSORT’s arithmetic operators and other capabilities it’s possible to compute a temporary result and sort on that field. Exploring that idea I’ll leave as an exercise to the reader.
If you want to try it create a file with count and total fields in each record. Use these two fields to calculate an average and sort on it, optionally discarding the resulting average field.
This is the sort of practical issue we’re thinking through right now. It’s proving challenging but fun!