Skip to main content

Tip: Twisting XML with XSLT 2.0

Create in-memory XML indices and data views

Jack Herrington (jherr@pobox.com), Editor-in-Chief, Code Generation Network
An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Summary:  The XML story has two sides: data creators and data consumers. XSL typically falls on the consumer side of the equation, and all too often the format of the data is fixed well before a template gets to it. Take a list of books, for example. You might have an XML file with a list sorted by title, but what if you want the list to be sorted by author, or you just want to display the distinct author names? Can XSL do that?

View more content in this series

Date:  31 Mar 2005
Level:  Intermediate
Activity:  1546 views

Sometimes, your XML has all the data you want but not in the form you need. For example, what do you do if you have a database of books sorted by title and you need it sorted by author? You can use an external program to twist the data. You can use two different stylesheets. Or, with XSLT 2.0, you can create a new XML tree in memory that has the information in the sequence you want. This tip shows you how to do the latter.

Listing 1 provides an example of a list of books.


Listing 1. The input XML file
                
<?xml version="1.0" encoding="UTF-8"?>
<books>
    <book name="Programming Ruby"
      author="Dave Thomas"/>
    <book name="Code Generation in Action"
      author="Jack Herrington"/>
    <book name="Pragmatic Programmer"
      author="Dave Thomas"/>
</books>

Here I have three of my favorite books: the excellent Programming Ruby and Pragmatic Programmer by Dave Thomas, and the sleeper hit, Code Generation in Action by yours truly. I start with this data, but what I really want are just the author names -- not repeated, either, but one entry for each individual author. In this data set example, I want to list Dave Thomas only once.

Create another in-memory tree

I begin the process with the creation of an in-memory tree for the new author table (see Listing 2).


Listing 2. Code to create an in-memory author tree
                
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
    <authors>
        <xsl:for-each select="/books/book">
            <author id="{@author}"/>
        </xsl:for-each>
    </authors>
</xsl:variable>
<xsl:copy-of select="$allauthors"/>
</xsl:template>
</xsl:stylesheet>

Note that the template is marked as version 2.0, which means that only XSLT 2.0 can run it. In-memory trees were an extension of earlier versions of XSLT. Now, they are part of the standard, but you need to raise the version number on the template to use them.

The first step is to create the $allauthors variable using the xsl:variable directive. Anything within that directive will be a new tree attached to the variable. In this case, I specify a root tag, authors. Then, within that tag, I iterate through each book and create a new tag with that author id. Listing 3 shows the output of this stylesheet.


Listing 3. The output of the stylesheet
                
<?xml version="1.0" encoding="UTF-8"?>
<authors>
   <author id="Dave Thomas"/>
   <author id="Jack Herrington"/>
   <author id="Dave Thomas"/>
</authors>

I see the contents of the $allauthors variable because I used the xsl:copy-of directive at the bottom of the template (see Listing 2) -- a useful technique when you're using in-memory trees. Whenever you want to know the contents of a variable, just use the xsl:copy-of directive. If you don't want the directive to go to the main output, bracket xsl:copy-of in an xsl:message directive. Doing so outputs the debug information to the warnings or standard error (stderr) output.

Another way to debug in-memory trees is to use an XSLT debugger of the type that's built into a sophisticated XSLT/XML editor.


Sort the authors

The next step in getting a distinct list of authors is to sort the author list. Do this by using the xsl:sort directive (see Listing 4).


Listing 4. Code to sort the author names
                
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
    <authors>
        <xsl:for-each select="/books/book">
            <xsl:sort select="@author"/>
            <author id="{@author}"/>
        </xsl:for-each>
    </authors>
</xsl:variable>
<xsl:copy-of select="$allauthors"/>
</xsl:template>
</xsl:stylesheet>

The only addition in this code over the code in Listing 2 is the xsl:sort directive within the xsl:for-each loop. When you specify a sort, you get the tags in the sorted order of the select statement you provide. Here, I use the author attribute as the sorting key. Listing 5 shows the output of this template.


Listing 5. The sorted list
                
<?xml version="1.0" encoding="UTF-8"?>
<authors>
   <author id="Dave Thomas"/>
   <author id="Dave Thomas"/>
   <author id="Jack Herrington"/>
</authors>

This looks promising. The only thing left is to reduce the list once again by removing redundant entries.


Pare down the list of authors

You find unique entries in a list using one of the lesser-known parts of XPath: axis operators. In this case, I iterate through the original list and create a new list -- but I specify that if the current item is the same as the previous one, it's ignored. To specify what the last item was in XPath, use the preceding-sibling axis. Listing 6 shows an example of this axis.


Listing 6. Code to build the list of distinct authors
                
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="2.0">
<xsl:template match="/">
<xsl:variable name="allauthors">
  <authors>
   <xsl:for-each select="/books/book">
    <xsl:sort select="@author"/>
    <author id="{@author}"/>
   </xsl:for-each>
 </authors>
</xsl:variable>
<xsl:variable name="authors">
 <authors>
  <xsl:for-each select="$allauthors/authors/author">
  <xsl:if test="not(preceding-sibling::author/@id=./@id)">
   <xsl:copy-of select="."/>
  </xsl:if>
  </xsl:for-each>
 </authors>
</xsl:variable>
<xsl:copy-of select="$authors"/>
</xsl:template>
</xsl:stylesheet>

Believe me, if you get this on the first pass, you're smarter than I am. The creation of the $allauthors variable is the same as it was before. But now, I'm creating a new variable called $authors using xsl:for-each on the $allauthors variable. The magic happens in the test statement of xsl:if, where I compare the current id attribute with the id attribute of the previous sibling.

XPath axes can be confusing at first, but they're worth learning. When you understand them, it becomes clear just how powerful they are. Listing 7 shows the final sorted distinct list of authors.


Listing 7. The final sorted distinct list
                
<?xml version="1.0" encoding="UTF-8"?>
<authors>
   <author id="Dave Thomas"/>
   <author id="Jack Herrington"/>
</authors>


Summary

If you don't control the source of your XML, you might need to do some manipulation to get the data twisted around in the way you want it. XSLT 2.0 makes this twisting easy by allowing for in-memory trees that can provide different views and indices on the original XML data.


Resources

About the author

An engineer with with more than 20 years of experience, Jack Herrington is currently Editor-in-Chief of the Code Generation Network. He is the author of Code Generation in Action . You can contact him at jack_d_herrington@codegeneration.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Open source
ArticleID=56991
ArticleTitle=Tip: Twisting XML with XSLT 2.0
publish-date=03312005
author1-email=jherr@pobox.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers