Skip to main content

Tip: Multi-pass XSLT

Use the node-set extension to break down the XSLT operation into two or more passes

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  Transforms can often be made cleaner and clearer if executed in phases or passes. First some intermediate output is produced, and then this is further transformed into a final output form. There can even be more than one intermediate form. In this tip, Uche Ogbuji discusses ways of breaking down XSLT operation into two or more clear passes of transformation using the common node-set extension.

View more content in this series

Date:  01 Sep 2002
Level:  Introductory
Activity:  3920 views

UNIX users are very familiar with the idea of pipes -- mechanisms that direct the output of one program so that it becomes the input for another. Pipes are behind perhaps the first major examples of modularizing loosely-coupled code. Each UNIX command is very simple and targeted; complex actions are produced by stringing them together. The processing of XML using XSLT has much to gain from this same sort of modularization.

You can improve code simplicity and reuse by breaking the transform into a set of separate phases or passes. Unfortunately, in pure XSLT 1.0, most of the commands for handling input of a transform are forbidden from use on output. This restriction has been removed in XSLT 2.0, and even in XSLT 1.0 (which has many more years of life) you can remove the restriction using an extension function that is usually provided by XSLT vendors.

To follow this tip, you should be familiar with XSLT.

Dissecting tables

I have a little XSLT template for taking a document table and displaying only the first item in each row. It is designed to work with the sort of tables used in DocBook (which are based on a model of tables popular in SGML). A sample table is shown in Listing 1 (db-table.xml).


Listing 1. Simple table in DocBook form (db-table.xml)
                

  <table frame="all">
<title>Numbers and tongues</title>
<tgroup cols="3" align="left" colsep="1" rowsep="1">
  <thead>
    <row>
      <entry>1</entry>
      <entry>2</entry>
      <entry>3</entry>
    </row>
  </thead>
  <tfoot>
    <row>
      <entry>I</entry>
      <entry>II</entry>
      <entry>III</entry>
    </row>
  </tfoot>
  <tbody>
  <row>
    <entry>one</entry>
    <entry>two</entry>
    <entry>three</entry>
  </row>
  <row>
    <entry>uno</entry>
    <entry>dos</entry>
    <entry>tres</entry>
  </row>
  <row>
    <entry>otu</entry>
    <entry>abuo</entry>
    <entry>ato</entry>
  </row>
  </tbody>
</tgroup>
</table>

Listing 2 (db-onecol.xslt) is a transform that renders only the first column of the table.


Listing 2. XSLT transform for rendering the first column of a DocBook-style table (db-onecol.xslt)
                

  <?xml version="1.0" encoding="utf-8"?>
<xsl:transform
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  version="1.0"
>

  <xsl:output method="text"/>

  <xsl:template match="table">
    <xsl:value-of select="title"/><xsl:text>
</xsl:text>
    <xsl:for-each select="tgroup/thead/row">
      <xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
    </xsl:for-each>
    <xsl:for-each select="tgroup/tbody/row">
      <xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
    </xsl:for-each>
    <xsl:for-each select="tgroup/tfoot/row">
      <xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:transform>

This outputs simple text. The entities are line feeds that are placed in xsl:text so that they are not stripped from the style sheet as white space. The rest is simple. When a table element is encountered, the title is output, followed by the first entry in the rows of the table head, body, and foot. I did not simplify to one xsl:for-each loop using tgroup/*/row, or the like, because the thead, tbody, and tfoot elements can come in any order in the document, and I wanted them processed in a specific order. The following session demonstrates how this transform is run:

$ 4xslt db-table.xml db-onecol.xslt
Numbers and tongues
1
one
uno
otu
I


Table model mismatch

Now I have an XHTML-style table in Listing 3 (xhtml-table.xml) that I'd like to process in the same way.


Listing 3. An XHTML-style table (xhtml-table.xml)
                

  <table border="1" frame="box">
<caption>Numbers and tongues</caption>
  <thead>
    <tr>
      <th>1</th>
      <th>2</th>
      <th>3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>one</td>
      <td>two</td>
      <td>three</td>
    </tr>
    <tr>
      <td>uno</td>
      <td>dos</td>
      <td>tres</td>
    </tr>
    <tr>
      <td>otu</td>
      <td>abuo</td>
      <td>ato</td>
    </tr>
  </tbody>
  <tfoot>
    <tr>
      <td>I</td>
      <td>II</td>
      <td>III</td>
    </tr>
  </tfoot>
</table>

Because this table has different element names and a slightly different organization, I cannot simply reuse the DocBook table template. I could copy this template over with some modifications to create a special version for XHTML elements, but this is a less modular approach. Another approach is to convert the XHTML to DocBook form and then pass that through the DocBook template; the advantage here is that I can also re-use other facilities for DocBook tables once the conversion has been made.

Listing 4 (xhtml-onecol.xslt) is a transform that uses the DocBook table module to operate on XHTML tables.


Listing 4. XSLT transform for rendering the first column of an XHTML-style table (xhtml-onecol.xslt)
                

  <?xml version="1.0" encoding="utf-8"?>
<xsl:transform
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  version="1.0"
>

  <xsl:import href="db-onecol.xslt"/>

  <xsl:template match="/">
    <xsl:apply-templates mode="xhtml"/>
  </xsl:template>

  <xsl:template match="table" mode="xhtml">
    <xsl:variable name="db-table">
      <xsl:call-template name="xhtml-table-to-db"/>
    </xsl:variable>
    <xsl:apply-templates
      select="exslt:node-set($db-table)/table"/>
  </xsl:template>

  <xsl:template name="xhtml-table-to-db">
    <xsl:copy>
      <title><xsl:value-of select="caption"/></title>
      <tgroup cols="{count(thead/tr/th)}">
        <thead>
          <row>
            <xsl:for-each select="thead/tr/th">
              <entry><xsl:apply-templates/></entry>
            </xsl:for-each>
          </row>
        </thead>
        <tfoot>
          <row>
            <xsl:for-each select="tfoot/tr/td">
              <entry><xsl:apply-templates/></entry>
            </xsl:for-each>
          </row>
        </tfoot>
        <tbody>
          <xsl:for-each select="tbody/tr">
          <row>
            <xsl:for-each select="td">
              <entry><xsl:apply-templates/></entry>
            </xsl:for-each>
          </row>
          </xsl:for-each>
        </tbody>
      </tgroup>
    </xsl:copy>
  </xsl:template>

</xsl:transform>

One important point: I have intentionally simplified these examples to focus on the main point. The style sheets use the pull style of XSLT (which means frequent use of xsl:for-each and xsl:value-of) rather than the push style (which uses a lot of templates and modes). I have done this because the pull style is more widely familiar, although the push style is superior in many ways. For example, in a real project I would write the template for converting XHTML tables to DocBook as a variation of the identity transform. Also, the templates would need much more logic to handle general cases of XHTML and DocBook tables.

The crux of the multi-pass technique occurs in the line:

<xsl:apply-templates select="exslt:node-set($db-table)/table"/>

This is the hand-off from one phase to the next. In the first pass, the XHTML table is converted to DocBook form within the variable db-table. This creates a result tree fragment of output very similar to that in Listing 1. To treat this as input on the second pass, I have to convert this from a result tree fragment to a node-set, which is what the exslt:node-set function does. This extension function is supported by several processors, and even processors that do not support the EXSLT extensions almost invariably provide their own proprietary node-set extension which works the same way.

I select the table element from this new node-set to kick off the second pass, in which the table template from the imported db-onecol.xslt module does its work. I use a mode (xhtml) to select the XHTML table so that this template does not interfere with the operation of the DocBook template, which has the same match, but lower import precedence.

The output of this transform is the same as that of the transform on a pure DocBook table. I was able to reuse the DocBook code just as I intended.


Summary

This example is an extreme simplification of a situation I encountered in a real project. I needed to reuse many DocBook processing templates on an XHTML source. By transforming XHTML content to DocBook in the first pass, and then re-using standard DocBook templates in subsequent passes, I saved a huge amount of work and debugging. The idea of multi-pass XSLT is even more general than this. In addition to promoting code reuse, it can also break complex transforms into chunks that are easy to understand. The next time you are faced with a complex problem in XSLT, determine whether it could be simplified or modularized as a series of piped operations.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12163
ArticleTitle=Tip: Multi-pass XSLT
publish-date=09012002
author1-email=uche@ogbuji.net
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers