======================================================================
IBM developerWorks XML Tip
October 1, 2002
Vol. 2, Issue 40
IBM's resource for developers
http://www-106.ibm.com/developerworks/?nx-1012
======================================================================
TIP: MULTI-PASS XSLT
Use the node-set extension to break down the XSLT operation
into two or more passes
Uche Ogbuji (uche.ogbuji@fourthought.com)
Principal Consultant, Fourthought, Inc.
Hello, XML Tip readers,
Transforms can often be made cleaner and clearer if executed in phases
or passes. First some intermediate output is produced, and then this
is further transformed into a final output form. There can even be
more than one intermediate form. In this tip, Uche Ogbuji discusses
ways of breaking down XSLT operation into two or more clear passes of
transformation using the common node-set extension.
For the rest of the tip, read on below.
Until next week,
XML Tip team at IBM developerWorks
dWnews@us.ibm.com
UNIX users are very familiar with the idea of pipes -- mechanisms that
direct the output of one program so that it becomes the input for
another. Pipes are behind perhaps the first major examples of
modularizing loosely-coupled code. Each UNIX command is very simple
and targeted; complex actions are produced by stringing them together.
The processing of XML using XSLT has much to gain from this same sort
of modularization.
You can improve code simplicity and reuse by breaking the transform
into a set of separate phases or passes. Unfortunately, in pure XSLT
1.0, most of the commands for handling input of a transform are
forbidden from use on output. This restriction has been removed in
XSLT 2.0, and even in XSLT 1.0 (which has many more years of life) you
can remove the restriction using an extension function that is usually
provided by XSLT vendors.
To follow this tip, you should be familiar with XSLT.
______________________________________________________________________
DISSECTING TABLES
I have a little XSLT template for taking a document table and
displaying only the first item in each row. It is designed to work
with the sort of tables used in DocBook (which are based on a model of
tables popular in SGML). A sample table is shown in Listing 1
(db-table.xml).
----------------------------------------------------------------------
Listing 1. Simple table in DocBook form (db-table.xml)
----------------------------------------------------------------------
<table frame="all">
<title>Numbers and tongues</title>
<tgroup cols="3" align="left" colsep="1" rowsep="1">
<thead>
<row>
<entry>1</entry>
<entry>2</entry>
<entry>3</entry>
</row>
</thead>
<tfoot>
<row>
<entry>I</entry>
<entry>II</entry>
<entry>III</entry>
</row>
</tfoot>
<tbody>
<row>
<entry>one</entry>
<entry>two</entry>
<entry>three</entry>
</row>
<row>
<entry>uno</entry>
<entry>dos</entry>
<entry>tres</entry>
</row>
<row>
<entry>otu</entry>
<entry>abuo</entry>
<entry>ato</entry>
</row>
</tbody>
</tgroup>
</table>
Listing 2 (db-onecol.xslt) is a transform that renders only the first
column of the table.
----------------------------------------------------------------------
Listing 2. XSLT transform for rendering the first column of a
DocBook-style table (db-onecol.xslt)
----------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="table">
<xsl:value-of select="title"/><xsl:text>
</xsl:text>
<xsl:for-each select="tgroup/thead/row">
<xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:for-each select="tgroup/tbody/row">
<xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
</xsl:for-each>
<xsl:for-each select="tgroup/tfoot/row">
<xsl:value-of select="entry[1]"/><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:transform>
This outputs simple text. The entities
are line feeds that are
placed in xsl:text so that they are not stripped from the style sheet
as white space. The rest is simple. When a table element is
encountered, the title is output, followed by the first entry in the
rows of the table head, body, and foot. I did not simplify to one
xsl:for-each loop using tgroup/*/row, or the like, because the thead,
tbody, and tfoot elements can come in any order in the document, and I
wanted them processed in a specific order. The following session
demonstrates how this transform is run:
$ 4xslt db-table.xml db-onecol.xslt
Numbers and tongues
1
one
uno
otu
I
______________________________________________________________________
TABLE MODEL MISMATCH
Now I have an XHTML-style table in Listing 3 (xhtml-table.xml) that
I'd like to process in the same way.
----------------------------------------------------------------------
Listing 3. An XHTML-style table (xhtml-table.xml)
----------------------------------------------------------------------
<table border="1" frame="box">
<caption>Numbers and tongues</caption>
<thead>
<tr>
<th>1</th>
<th>2</th>
<th>3</th>
</tr>
</thead>
<tbody>
<tr>
<td>one</td>
<td>two</td>
<td>three</td>
</tr>
<tr>
<td>uno</td>
<td>dos</td>
<td>tres</td>
</tr>
<tr>
<td>otu</td>
<td>abuo</td>
<td>ato</td>
</tr>
</tbody>
<tfoot>
<tr>
<td>I</td>
<td>II</td>
<td>III</td>
</tr>
</tfoot>
</table>
Because this table has different element names and a slightly
different organization, I cannot simply reuse the DocBook table
template. I could copy this template over with some modifications to
create a special version for XHTML elements, but this is a less
modular approach. Another approach is to convert the XHTML to DocBook
form and then pass that through the DocBook template; the advantage
here is that I can also re-use other facilities for DocBook tables
once the conversion has been made.
Listing 4 (xhtml-onecol.xslt) is a transform that uses the DocBook
table module to operate on XHTML tables.
----------------------------------------------------------------------
Listing 4. XSLT transform for rendering the first column of an
XHTML-style table (xhtml-onecol.xslt)
----------------------------------------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
version="1.0">
<xsl:import href="db-onecol.xslt"/>
<xsl:template match="/">
<xsl:apply-templates mode="xhtml"/>
</xsl:template>
<xsl:template match="table" mode="xhtml">
<xsl:variable name="db-table">
<xsl:call-template name="xhtml-table-to-db"/>
</xsl:variable>
<xsl:apply-templates
select="exslt:node-set($db-table)/table"/>
</xsl:template>
<xsl:template name="xhtml-table-to-db">
<xsl:copy>
<title><xsl:value-of select="caption"/></title>
<tgroup cols="{count(thead/tr/th)}">
<thead>
<row>
<xsl:for-each select="thead/tr/th">
<entry><xsl:apply-templates/></entry>
</xsl:for-each>
</row>
</thead>
<tfoot>
<row>
<xsl:for-each select="tfoot/tr/td">
<entry><xsl:apply-templates/></entry>
</xsl:for-each>
</row>
</tfoot>
<tbody>
<xsl:for-each select="tbody/tr">
<row>
<xsl:for-each select="td">
<entry><xsl:apply-templates/></entry>
</xsl:for-each>
</row>
</xsl:for-each>
</tbody>
</tgroup>
</xsl:copy>
</xsl:template>
</xsl:transform>
One important point: I have intentionally simplified these examples to
focus on the main point. The style sheets use the pull style of XSLT
(which means frequent use of xsl:for-each and xsl:value-of) rather
than the push style (which uses a lot of templates and modes). I have
done this because the pull style is more widely familiar, although the
push style is superior in many ways. For example, in a real project I
would write the template for converting XHTML tables to DocBook as a
variation of the identity transform. Also, the templates would need
much more logic to handle general cases of XHTML and DocBook tables.
The crux of the multi-pass technique occurs in the line:
<xsl:apply-templates select="exslt:node-set($db-table)/table"/>
This is the hand-off from one phase to the next. In the first pass,
the XHTML table is converted to DocBook form within the variable
db-table. This creates a result tree fragment of output very similar
to that in Listing 1. To treat this as input on the second pass, I
have to convert this from a result tree fragment to a node-set, which
is what the exslt:node-set function does. This extension function is
supported by several processors, and even processors that do not
support the EXSLT extensions almost invariably provide their own
proprietary node-set extension which works the same way.
I select the table element from this new node-set to kick off the
second pass, in which the table template from the imported
db-onecol.xslt module does its work. I use a mode (xhtml) to select
the XHTML table so that this template does not interfere with the
operation of the DocBook template, which has the same match, but lower
import precedence.
The output of this transform is the same as that of the transform on a
pure DocBook table. I was able to reuse the DocBook code just as I
intended.
______________________________________________________________________
SUMMARY
This example is an extreme simplification of a situation I encountered
in a real project. I needed to reuse many DocBook processing templates
on an XHTML source. By transforming XHTML content to DocBook in the
first pass, and then re-using standard DocBook templates in subsequent
passes, I saved a huge amount of work and debugging. The idea of
multi-pass XSLT is even more general than this. In addition to
promoting code reuse, it can also break complex transforms into chunks
that are easy to understand. The next time you are faced with a
complex problem in XSLT, determine whether it could be simplified or
modularized as a series of piped operations.
======================================================================
LINKS TO OTHER GOOD STUFF
::: IBM developerWorks XML Zone :::
http://www-106.ibm.com/developerworks/xml/?nx-1012
::: Resources related to this tip :::
http://www.ibm.com/developer/library/x-tipxsltmp.html#resources
::: Full text of this tip on the Web :::
http://www-106.ibm.com/developerworks/library/x-tipxsltmp.html/?nx-1012
::: Index of other XML tips :::
http://www-106.ibm.com/developerworks/library/x-tips.html?nx-1012
::: Most recent issue of the IBM developerWorks newsletter:
http://www.ibm.com/developerworks/newsletter/dwte092602.html?nx-1012
======================================================================
ABOUT THIS NEWSLETTER
Created by IBM developerWorks (http://www.ibm.com/developerworks/)
Delivered by Topica (http://www.topica.com/tep/index.html)
======================================================================
Subscribe: http://www-106.ibm.com/developerworks/newsletter/?n-about
Unsubscribe: ${unsub_link}
Get help: mailto:customersupport@ibmdw.email-publisher.com
Send comments:
http://www-105.ibm.com/developerworks/newcontent.nsf/dW_feedback/
IBM's privacy policy: http://www.ibm.com/privacy/
IBM's copyright and trademark information:
http://www.ibm.com/legal/copytrade.phtml
THIS NEWSLETTER IS FOR INFORMATION ONLY. This newsletter should not
be interpreted to be a commitment on the part of IBM, and, after the
publication date, IBM cannot guarantee the accuracy of any information
presented. You may copy and distribute this newsletter, as long as:
1. All text is copied without modification and all pages are included.
2. All copies contain IBM's copyright notice and any other notices
provided therein.
3. This document is not distributed for profit.