Sometimes, your XML has all the data you want but not in the form you need. For example, what do you do if you have a database of books sorted by title and you need it sorted by author? You can use an external program to twist the data. You can use two different stylesheets. Or, with XSLT 2.0, you can create a new XML tree in memory that has the information in the sequence you want. This tip shows you how to do the latter.
Listing 1 provides an example of a list of books.
Listing 1. The input XML file
<?xml version="1.0" encoding="UTF-8"?>
<books>
<book name="Programming Ruby"
author="Dave Thomas"/>
<book name="Code Generation in Action"
author="Jack Herrington"/>
<book name="Pragmatic Programmer"
author="Dave Thomas"/>
</books>
|
Here I have three of my favorite books: the excellent Programming Ruby and Pragmatic Programmer by Dave Thomas, and the sleeper hit, Code Generation in Action by yours truly. I start with this data, but what I really want are just the author names -- not repeated, either, but one entry for each individual author. In this data set example, I want to list Dave Thomas only once.
I begin the process with the creation of an in-memory tree for the new author table (see Listing 2).
Listing 2. Code to create an in-memory author tree
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
<authors>
<xsl:for-each select="/books/book">
<author id="{@author}"/>
</xsl:for-each>
</authors>
</xsl:variable>
<xsl:copy-of select="$allauthors"/>
</xsl:template>
</xsl:stylesheet>
|
Note that the template is marked as version 2.0, which means that only XSLT 2.0 can run it. In-memory trees were an extension of earlier versions of XSLT. Now, they are part of the standard, but you need to raise the version number on the template to use them.
The first step is to create the $allauthors variable using the xsl:variable directive. Anything within that directive will be a new tree attached to the variable. In this case, I specify a root tag, authors. Then, within that tag, I iterate through each book and create a new tag with that author id. Listing 3 shows the output of this stylesheet.
Listing 3. The output of the stylesheet
<?xml version="1.0" encoding="UTF-8"?>
<authors>
<author id="Dave Thomas"/>
<author id="Jack Herrington"/>
<author id="Dave Thomas"/>
</authors>
|
I see the contents of the $allauthors variable because I used the xsl:copy-of directive at the bottom of the template (see Listing 2) -- a useful technique when you're using in-memory trees. Whenever you want to know the contents of a variable, just use the xsl:copy-of directive. If you don't want the directive to go to the main output, bracket xsl:copy-of in an xsl:message directive. Doing so outputs the debug information to the warnings or standard error (stderr) output.
Another way to debug in-memory trees is to use an XSLT debugger of the type that's built into a sophisticated XSLT/XML editor.
The next step in getting a distinct list of authors is to sort the author list. Do this by using the xsl:sort directive (see Listing 4).
Listing 4. Code to sort the author names
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
<authors>
<xsl:for-each select="/books/book">
<xsl:sort select="@author"/>
<author id="{@author}"/>
</xsl:for-each>
</authors>
</xsl:variable>
<xsl:copy-of select="$allauthors"/>
</xsl:template>
</xsl:stylesheet>
|
The only addition in this code over the code in Listing 2 is the xsl:sort directive within the xsl:for-each loop. When you specify a sort, you get the tags in the sorted order of the select statement you provide. Here, I use the author attribute as the sorting key. Listing 5 shows the output of this template.
Listing 5. The sorted list
<?xml version="1.0" encoding="UTF-8"?>
<authors>
<author id="Dave Thomas"/>
<author id="Dave Thomas"/>
<author id="Jack Herrington"/>
</authors>
|
This looks promising. The only thing left is to reduce the list once again by removing redundant entries.
You find unique entries in a list using one of the lesser-known parts of XPath: axis operators. In this case, I iterate through the original list and create a new list -- but I specify that if the current item is the same as the previous one, it's ignored. To specify what the last item was in XPath, use the preceding-sibling axis. Listing 6 shows an example of this axis.
Listing 6. Code to build the list of distinct authors
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
<authors>
<xsl:for-each select="/books/book">
<xsl:sort select="@author"/>
<author id="{@author}"/>
</xsl:for-each>
</authors>
</xsl:variable>
<xsl:variable name="authors">
<authors>
<xsl:for-each select="$allauthors/authors/author">
<xsl:if test="not(preceding-sibling::author/@id=./@id)">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
</authors>
</xsl:variable>
<xsl:copy-of select="$authors"/>
</xsl:template>
</xsl:stylesheet>
|
Believe me, if you get this on the first pass, you're smarter than I am. The creation of the $allauthors variable is the same as it was before. But now, I'm creating a new variable called $authors using xsl:for-each on the $allauthors variable. The magic happens in the test statement of xsl:if, where I compare the current id attribute with the id attribute of the previous sibling.
XPath axes can be confusing at first, but they're worth learning. When you understand them, it becomes clear just how powerful they are. Listing 7 shows the final sorted distinct list of authors.
Listing 7. The final sorted distinct list
<?xml version="1.0" encoding="UTF-8"?>
<authors>
<author id="Dave Thomas"/>
<author id="Jack Herrington"/>
</authors>
|
If you don't control the source of your XML, you might need to do some manipulation to get the data twisted around in the way you want it. XSLT 2.0 makes this twisting easy by allowing for in-memory trees that can provide different views and indices on the original XML data.
- Visit the XSL standards site at the W3C, a handy reference to XSL technologies and standards.
- Take a look at Jack Herrington's other developerWorks tips on XSLT 2.0, "Batch processing XML with XSLT 2.0" (March 2005) and "Create multiple files in XSLT 2.0" (March 2005).
- Check out the XPath page at the W3C, which provides version and standard information.
- Download Saxon, the popular XSL processor that was used in the creation of this article.
- Read Michael Kay's
XSLT 2.0 Programmer's Reference
, the bible of XSLT. It's a fantastic introduction and a valuable reference work.
- While you're at it, pick up
XPath 2.0 Programmer's Reference
by Michael Kay -- the ultimate reference by the man who wrote the W3C specification.
- Read
Code Generation in Action
by Jack D. Herrington, which covers generating code for a wide variety of targets not limited to database access.
- Read
Programming Ruby: The Pragmatic Programmer's Guide
by Dave Thomas, Chad Fowler, and Andy Hunt, and
The Pragmatic Programmer: From Journeyman to Master
by Andrew Hunt and Dave Thomas, the other two books referenced in this tip.
- Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.
Comments (Undergoing maintenance)





