Skip to main content

Recurse, not divide, to conquer

Why not to divide an HTML element between XSLT templates, and what to do instead

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example , Applied XML Solutions and XML and the Enterprise. He is a columnist for Gamelan. Details on his latest projects are at www.marchal.com.

Summary:  Software consultant and author Benoît Marchal answers an XSLT student's frequently asked question: How do you divide an HTML element between two XSLT templates? The trick is to ask the right question. This article demonstrates how to shift your thinking into the XSLT recursive approach, which is especially helpful if you have a background in a procedural language (Java and the like). Sample code demonstrates the right way (and the wrong way) to work with a flat XML or XHTML file that you want to process hierarchically.

Date:  01 Jul 2001
Level:  Introductory
Activity:  2055 views
Comments:  

I like to think of XSLT (XSL Transformations) as a simple and effective scripting language to manipulate XML documents. I have used XSLT in a broad range of applications encompassing publishing and application integration. I have come to enjoy XSLT, but I have also learned that it can be disconcerting to experienced developers who are learning XSLT because it has a distinct functional/recursive flavour (as opposed to procedural programming languages, such as Java). As this article illustrates, understanding the XSLT working model makes it possible to develop algorithms that work well with the language.

A common question

Students of XSLT regularly ask me how to split an HTML (or XML) tag across two XSLT templates. The question arises when a developer is trying to add a hierarchical level to an XML document. I think it's worth studying this problem in some details for two reasons:

  • It's a frequently asked question, and many developers will benefit from an answer.
  • Even more importantly, it's the wrong question to ask.

In this article I'll suggest which question makes more sense, and I'll tell you the answer to that question.

Consider products.xml in Listing 1. This document contains a list of products marked up in XML. There are tags for the name (ps:name), a short description (ps:description), and the price (ps:price). All tags are in the http://www.psol.com/2001/07/dw namespace. Remember that the namespace URI is used solely as an identifier; it does not point to a Web site.

Unfortunately products.xml has a flat structure. More specifically, it lacks tags to group all the data pertaining to a given product. One can infer that a new name marks the beginning of a new product, but it's not explicit in the markup. Although this document is well-formed, the lack of hierarchical information makes processing it more difficult. Yet such a flat structure is fairly common. Another example of flat structure is XHTML that uses the <h1> tag to mark the beginning of new sections rather than any explicit section grouping.

<?xml version="1.0"?>
<ps:products xmlns:ps="http://www.psol.com/2001/07/dw">
   <ps:name>WizzBang Ultra Word Processor</ps:name>
   <ps:description>More words per minute than the competition.</ps:description>
   <ps:price>$799.99</ps:price>
   <ps:name>Super WizzBang Calculator</ps:name>
   <ps:description>Cheap and reliable with power saving.</ps:description>
   <ps:price>$5.99</ps:price>
   <ps:name>WizzBang Safest Safe</ps:name>
   <ps:description>Choose the authentic WizzBang Safest Safe.</ps:description>
   <ps:price>$1,999.00</ps:price>
</ps:products>

Suppose you are tasked with creating an XSLT style sheet to render the product list in an HTML table as shown in Figure 1. Most of the style sheet is easy, but what about the <tr> element, the HTML code for a new table row?


Figure 1. Rendering the product list as a table
NameDescriptionPrice
WizzBang Ultra Word ProcessorMore words per minute than the competition.$799.99
Super WizzBang CalculatorCheap and reliable with power saving.$5.99
WizzBang Safest SafeChoose the authentic WizzBang Safest Safe.$1999.00

XSLT Processor

The code in this article is standard XSLT so you need a standard-compliant XSLT processor such as Xalan or XT.

Note: If you plan to use Internet Explorer, you first have to upgrade to MSXML 3.0 (see Resources).

At first sight it appears the solution is to issue the opening tag (<tr>) in the template for <ps:name> and the corresponding closing tag (</tr>) when you hit <ps:price>. However, for reasons that will become clear in a moment, this approach does not work.

The following style sheet, table-bad.xsl in Listing 2, illustrates why it is not a well-formed XML document. The processor reports an error similar to: "This file is not well-formed: tr closing element name expected." Indeed an XSLT style sheet is itself an XML document so it must follow the XML rules.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:ps="http://www.psol.com/2001/07/dw">

<xsl:output method="html"/>

<xsl:template match="ps:products">
   <html>
      <head><title>Product List</title></head>
      <body>
         <table>
            <tr><td>Name</td><td>Description</td><td>Price</td></tr>
            <xsl:apply-templates/>
         </table>
      </body>
   </html>
</xsl:template>

<xsl:template match="ps:name">
   <tr>
   <td><xsl:apply-templates/></td>
</xsl:template>

<xsl:template match="ps:description">
   <td><xsl:apply-templates/></td>
</xsl:template>

<xsl:template match="ps:price">
   <td><xsl:apply-templates/></td>
   </tr>
</xsl:template>

</xsl:stylesheet>


From bad to worse

The following-sibling Axis

By default, XPaths follow the parent/child relationship. For example, selects ps:name elements which are children of ps:products.

Axis lets you select another relationship and, as the name implies, following-sibling selects the elements following the current node provided they are siblings: That is, it ignores descendants.

For example ps:name/following-sibling::ps:price[1] selects the first ps:price element after each name. In other words, it selects the products' price.

At this point, XSLT novices usually ask how to split HTML elements across two or more templates. Unfortunately there's no clean solution. table-bad.xsl breaks a fundamental XSLT rule, and the only proper solution is to shift gears and adopt another strategy which I'll introduce in the next section, The right question.

For the sake of discussion, let's consider what would happen if you were to persist in that inappropriate approach. Take a moment to review table-worse.xsl in Listing 3. This style sheet uses a horrible hack to split the <tr> across two templates. I stress that this technique is intrinsically wrong. Take table-worse.xsl as a warning against improper coding, not as a solution.

The hack uses a combination of the <xsl:text> element and a CDATA section to force the XSLT processor to accept the split:

<xsl:text disable-output-escaping="yes"><![CDATA[<tr>]]></xsl:text>

Note the disable-output-escaping attribute; it essentially tells the processor to blindly copy its content in the result document. A CDATA section escapes the litigious < character, and the processor compiles. However this hack is a sure way to get into trouble because the processor can no longer ensure that the document is well-formed. You'd lose a very important safeguard, and you could end up with hard-to-track bugs.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:ps="http://www.psol.com/2001/07/dw">
<xsl:output method="html"/>
<xsl:template match="ps:products">
   <html>
      <head><title>Product List</title></head>
      <body>
         <table>
            <tr><td>Name</td><td>Description</td><td>Price</td></tr>
            <xsl:apply-templates/>
         </table>
      </body>
   </html>
</xsl:template>
<xsl:template match="ps:name">
   <xsl:text disable-output-escaping="yes"><![CDATA[<tr>]]></xsl:text>
   <td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:description">
   <td><xsl:apply-templates/></td>
</xsl:template>
<xsl:template match="ps:price">
   <td><xsl:apply-templates/></td>
   <xsl:text disable-output-escaping="yes"><![CDATA[</tr>]]></xsl:text>
</xsl:template>
</xsl:stylesheet>


The right question

A much better solution exists, and all it takes is a slight change of attitude. So far, the examples have taken a procedural approach: The algorithm is akin to a loop over the entire document.

Yet that's not how the processor works. The processor implements a recursive algorithm. By default, it does what's known as a depth-first walk in a tree. I know that sounds like a title for an algorithm course, and that's exactly what it is. Depth-first walking means that the processor visits all the children of a given node recursively until it has processed the entire document.

All you have to do is alter this algorithm slightly so that the processor visits ps:description and ps:price (the elements following ps:name) while it visits a ps:name. As table-good.xsl in Listing 4 illustrates, it's not difficult.

For efficiency, the ps:products template selects the ps:name. If not for this small optimization, we would go through every element twice.

The interesting bit starts in the ps:name template, which contains two <xsl:apply-templates> instructions. The first one is a regular call to process the element's content. The second one branches off into a special walking of the tree (see the The following-sibling Axis sidebar for details). It selects the first element after ps:name. In other words, instead of going depth-first (to the children), it goes through the adjacent elements. It also uses a special "within" mode (see the The mode attribute sidebar for details).

The mode attribute

The attribute lets you associate more than one set of templates to each element and selectively apply one or the other. The XSLT processor applies only those templates whose mode attribute matches the mode introduced in the corresponding <xsl:apply-templates> instruction.

Note that both <xsl:apply-templates> instructions are enclosed in the now famous <tr> element -- no more splitting across two templates!

There are two more templates in "within" mode. The first one catches all the nodes (* XPath) and, again, uses two <xsl:apply-templates> instructions to continue the special walking of the document. The last template in "within" mode is ps:name, which stops the special walking. The idea being that when you hit a ps:name within the processing of another ps:name you have reached the end of the current product. This is the stop condition for the recursion.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:ps="http://www.psol.com/2001/07/dw">

<xsl:output method="html"/>

<xsl:template match="ps:products">
   <html>
      <head><title>Product List</title></head>
      <body>
         <table>
            <tr><td>Name</td><td>Description</td><td>Price</td></tr>
            <xsl:apply-templates select="ps:name"/>
         </table>
      </body>
   </html>
</xsl:template>

<xsl:template match="ps:name">
   <tr>
      <td><xsl:apply-templates/></td>
      <xsl:apply-templates select="following-sibling::*[1]" mode="within"/>
   </tr>
</xsl:template>

<xsl:template match="*" mode="within">
   <xsl:apply-templates select="."/>
   <xsl:apply-templates select="following-sibling::*[1]" mode="within"/>
</xsl:template>

<xsl:template match="ps:name" mode="within"/>

<xsl:template match="ps:description">
   <td><xsl:apply-templates/></td>
</xsl:template>

<xsl:template match="ps:price">
   <td><xsl:apply-templates/></td>
</xsl:template>

</xsl:stylesheet>


Recurse and conquer

When writing XSLT style sheets, it helps to remember that the XSLT processor follows a recursive algorithm. You have lots of control over the recursion and, in many cases, it pays to find a recursive algorithm -- even if your natural inclination (as a Java developer, for example) is not towards recursion.


Resources

About the author

Benoit Marchal

Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example , Applied XML Solutions and XML and the Enterprise. He is a columnist for Gamelan. Details on his latest projects are at www.marchal.com.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12013
ArticleTitle=Recurse, not divide, to conquer
publish-date=07012001
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers