I like to think of XSLT (XSL Transformations) as a simple and effective scripting language to manipulate XML documents. I have used XSLT in a broad range of applications encompassing publishing and application integration. I have come to enjoy XSLT, but I have also learned that it can be disconcerting to experienced developers who are learning XSLT because it has a distinct functional/recursive flavour (as opposed to procedural programming languages, such as Java). As this article illustrates, understanding the XSLT working model makes it possible to develop algorithms that work well with the language.
Students of XSLT regularly ask me how to split an HTML (or XML) tag across two XSLT templates. The question arises when a developer is trying to add a hierarchical level to an XML document. I think it's worth studying this problem in some details for two reasons:
- It's a frequently asked question, and many developers will benefit from an answer.
- Even more importantly, it's the wrong question to ask.
In this article I'll suggest which question makes more sense, and I'll tell you the answer to that question.
Consider products.xml in Listing 1. This document contains a list of products marked up in XML. There are tags for the name (ps:name), a short description
(ps:description), and the price (ps:price). All tags are in the
http://www.psol.com/2001/07/dw namespace. Remember that the namespace URI is used solely as an identifier; it does not point to a Web site.
Unfortunately products.xml has a flat structure. More specifically, it lacks tags to group all the data pertaining to a given product. One can infer that a new name marks the beginning of a new product, but it's not explicit in the markup. Although this document is well-formed, the lack of hierarchical information makes processing it more difficult. Yet such a flat structure is fairly common. Another example of flat structure is XHTML that uses the <h1> tag to mark the beginning of new sections rather than any explicit section grouping.
<?xml version="1.0"?> <ps:products xmlns:ps="http://www.psol.com/2001/07/dw"> <ps:name>WizzBang Ultra Word Processor</ps:name> <ps:description>More words per minute than the competition.</ps:description> <ps:price>$799.99</ps:price> <ps:name>Super WizzBang Calculator</ps:name> <ps:description>Cheap and reliable with power saving.</ps:description> <ps:price>$5.99</ps:price> <ps:name>WizzBang Safest Safe</ps:name> <ps:description>Choose the authentic WizzBang Safest Safe.</ps:description> <ps:price>$1,999.00</ps:price> </ps:products> |
Suppose you are tasked with creating an XSLT style sheet to render the product list in an HTML table as shown in Figure 1. Most of the style sheet is easy, but what about the <tr> element, the HTML code for a new table row?
Figure 1. Rendering the product list as a table
| Name | Description | Price |
| WizzBang Ultra Word Processor | More words per minute than the competition. | $799.99 |
| Super WizzBang Calculator | Cheap and reliable with power saving. | $5.99 |
| WizzBang Safest Safe | Choose the authentic WizzBang Safest Safe. | $1999.00 |
At first sight it appears the solution is to issue the opening tag (<tr>) in the template for <ps:name> and the corresponding closing tag (</tr>) when you hit <ps:price>. However, for reasons that will become clear in a moment, this approach does not work.
The following style sheet, table-bad.xsl in Listing 2, illustrates why it is not a well-formed XML document. The processor reports an error similar to: "This file is not well-formed: tr closing element name expected." Indeed an XSLT style sheet is itself an XML document so it must follow the XML rules.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ps="http://www.psol.com/2001/07/dw">
<xsl:output method="html"/>
<xsl:template match="ps:products">
<html>
<head><title>Product List</title></head>
<body>
<table>
<tr><td>Name</td><td>Description</td><td>Price</td></tr>
<xsl:apply-templates/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ps:name">
<tr>
<td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:description">
<td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:price">
<td><xsl:apply-templates/></td>
</tr>
</xsl:template>
</xsl:stylesheet> |
At this point, XSLT novices usually ask how to split HTML elements across two or more templates. Unfortunately there's no clean solution. table-bad.xsl breaks a fundamental XSLT rule, and the only proper solution is to shift gears and adopt another strategy which I'll introduce in the next section, The right question.
For the sake of discussion, let's consider what would happen if you were to persist in that inappropriate approach. Take a moment to review table-worse.xsl in Listing 3. This style sheet uses a horrible hack to split the <tr> across two templates. I stress that this technique is intrinsically wrong. Take table-worse.xsl as a warning against improper coding, not as a solution.
The hack uses a combination of the <xsl:text> element and a CDATA section to force the XSLT processor to accept the split:
<xsl:text disable-output-escaping="yes"><![CDATA[<tr>]]></xsl:text> |
Note the disable-output-escaping attribute; it essentially tells the processor to blindly copy its content in the result document. A CDATA section escapes the litigious < character, and the processor compiles. However this hack is a sure way to get into trouble because the processor can no longer ensure that the document is well-formed. You'd lose a very important safeguard, and you could end up with hard-to-track bugs.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ps="http://www.psol.com/2001/07/dw">
<xsl:output method="html"/>
<xsl:template match="ps:products">
<html>
<head><title>Product List</title></head>
<body>
<table>
<tr><td>Name</td><td>Description</td><td>Price</td></tr>
<xsl:apply-templates/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ps:name">
<xsl:text disable-output-escaping="yes"><![CDATA[<tr>]]></xsl:text>
<td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:description">
<td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:price">
<td><xsl:apply-templates/></td>
<xsl:text disable-output-escaping="yes"><![CDATA[</tr>]]></xsl:text>
</xsl:template>
</xsl:stylesheet> |
A much better solution exists, and all it takes is a slight change of attitude. So far, the examples have taken a procedural approach: The algorithm is akin to a loop over the entire document.
Yet that's not how the processor works. The processor implements a recursive algorithm. By default, it does what's known as a depth-first walk in a tree. I know that sounds like a title for an algorithm course, and that's exactly what it is. Depth-first walking means that the processor visits all the children of a given node recursively until it has processed the entire document.
All you have to do is alter this algorithm slightly so that the processor visits ps:description and ps:price (the elements following ps:name) while it visits a ps:name. As table-good.xsl in Listing 4 illustrates, it's not difficult.
For efficiency, the ps:products template selects the ps:name. If not for this small optimization, we would go through every element twice.
The interesting bit starts in the ps:name template, which contains two <xsl:apply-templates> instructions. The first one is a regular call to process the element's content. The second one branches off into a special walking of the tree (see the The following-sibling Axis sidebar for details). It selects the first element after ps:name. In other words, instead of going depth-first (to the children), it goes through the adjacent elements. It also uses a special "within" mode (see the The mode attribute sidebar for details).
Note that both <xsl:apply-templates> instructions are enclosed in the now famous <tr> element -- no more splitting across two templates!
There are two more templates in "within" mode. The first one catches all the nodes (* XPath) and, again, uses two <xsl:apply-templates> instructions to continue the special walking of the document. The last template in "within" mode is ps:name, which stops the special walking. The idea being that when you hit a ps:name within the processing of another ps:name you have reached the end of the current product. This is the stop condition for the recursion.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ps="http://www.psol.com/2001/07/dw">
<xsl:output method="html"/>
<xsl:template match="ps:products">
<html>
<head><title>Product List</title></head>
<body>
<table>
<tr><td>Name</td><td>Description</td><td>Price</td></tr>
<xsl:apply-templates select="ps:name"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="ps:name">
<tr>
<td><xsl:apply-templates/></td>
<xsl:apply-templates select="following-sibling::*[1]" mode="within"/>
</tr>
</xsl:template>
<xsl:template match="*" mode="within">
<xsl:apply-templates select="."/>
<xsl:apply-templates select="following-sibling::*[1]" mode="within"/>
</xsl:template>
<xsl:template match="ps:name" mode="within"/>
<xsl:template match="ps:description">
<td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:price">
<td><xsl:apply-templates/></td>
</xsl:template>
</xsl:stylesheet> |
When writing XSLT style sheets, it helps to remember that the XSLT processor follows a recursive algorithm. You have lots of control over the recursion and, in many cases, it pays to find a recursive algorithm -- even if your natural inclination (as a Java developer, for example) is not towards recursion.
- Try out these open-source XSLT processors:
- Download the MSXML 3.0 upgrade if you want to use standard-compliant XSLT style sheets with Internet Explorer.
-
Managing e-zines with JavaMail and XSLT illustrates a practical application of XSLT.
-
Rendering XML Documents with IBM WebSphere explains how to apply XSLT style sheets from WebSphere.
- IBM Certified Developer will certify your knowledge of XML and Related Technologies.

Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example , Applied XML Solutions and XML and the Enterprise. He is a columnist for Gamelan. Details on his latest projects are at www.marchal.com.





