XML, XSLT and DFSORT, Part One - Creating A Flat File With XSLT
MartinPacker 11000094DH Visits (13444)
This is the second part of a (currently) three-part series on processing XML data with DFSORT, given a little help from standard XML processing tools. The first part - which you should read before reading on - is here.
To recap, getting XML data into DFSORT is a two stage process:
This post covers the first part of this. You'll see how you can transform the XML file below into a Comma-Separated Variable (CSV) file.
Here's the source XML, complete with a few quirks:
Here's the resulting flat file:
I'm assuming you can read XML reasonably well. In this example we have three "item" elements as children of a "stuff" element. The "stuff" element is a child of the "mydoc" element. The "mydoc" element also contains a "greeting" element. Each "item" element has a single "row" child element and an "a" attribute.
To produce the output we need to find the "item" elements and pick up the "row" child element and the "a" attribute value. We write one record for each "item" element. (We ignore the "greeting" element entirely.)
You may notice some white space around the output: A leading blank line and a trailing one, as well as four spaces at the beginning of each output record. I've not found a way for getting rid of those and the DFSORT program (described in the next part of this series) will have to strip them off.
I've deliberately formatted each "item" element slightly differently:
The point is that XML is so flexible in its layout you're better off relying on a supplied parser than writing your own. It's true that there are good parsers that don't do XSLT transformations. And obviously the z/OS System XML one is very nice, particularly with its ability to use specialty engines. As I said in my previous post, XML parsing is computationally expensive.
Why not write your own code that calls the z/OS System XML parser? That's certainly an option - and indeed you might find the transformations you want to do can't (or shouldn't) be done with XSLT. Here the similarity to DFSORT is quite strong: Both provide ways to use built-in functions to transform data - neither of which require a formal programming language (in XML's case perhaps PHP, java or C++ and DFSORT's case perhaps Assembler, COBOL or PL/I).
In this example you scarcely need to write your own program. (Handling item 3, as I'll describe later, is the one case where a program might be better.).
Here's the XSLT stylesheet that produces the required output:
This is a fairly simple stylesheet. Here's how it works (and the numbered lines above correspond to the numbering below:
Because item 3's "row" value was split across several lines the normalize-space() function is used to take out leading white space. It has the unfortunate side-effect of replacing multiple white space characters in the text with a single space so it's not brilliant. You could write a fairly simple but recursive piece of XSLT to do the job properly - but it's beyond the scope of this post. In fact this might be the thing that makes you abandon XSLT and call the XML parser from a program.
If you want to get into XSLT I can recommend Doug Tidwell's XSLT, Second Edition Mastering XML Transformations book. It's what I've used - with some additional research on the web (which didn't yield much additional insight).
I used the Saxon B (free) parser as it's the only one I can get my hands on that does XSLT 2.0. It's a java jar. You could use others, of course.
Invoking from the OMVS I found a 64MB heap specification was enough (running in a 128MB region). For more complex transformations I can see a larger heap might be needed. (In fact I didn't check how much garbage collection, if any, the JVM did. It just ran.)
(If you specify version="1.0" for the stylesheet Saxon will issue a message informing you you're running a 1.0 stylesheet through a 2.0 processor. This has caused no problems whatsoever for me.)
Originally I downloaded Saxon to my Linux laptop and used it with an ASCII stylesheet and XML data. Transferring to z/OS was straightforward. This approach may work for you, if you're setting out to learn XSLT.
Learning and working with XSLT continues to be a journey of discovery. If I'm missing some tricks that you spot feel free to let me know. The next post in this series will be about the DFSORT counterpart.