Teaching is an incredible learning experience. I teach XML and XSLT to developers for corporate trainings and during conferences, and I have found repeatedly that the effort I make to clarify a complex issue for my students helps to further my own understanding. I'm not only teaching the students, I teach myself as well.
Furthermore, students bring their own unique perspectives, which often force me to rethink aspects of certain issues and draw new conclusions. This article was born out of one such experience. I realized that students who have been exposed to JSP, PHP, ASP, or ColdFusion often make incorrect assumptions about XSLT; such assumptions can lead to coding mistakes. While researching how to clarify this topic, I began thinking about how XSLT processors (such as Xalan, Saxon, or MSXML) really work. This new perspective has helped me and I'm sure it will help you too.
At first glance, you can see many similarities between Web development languages, such as JSP or PHP, and XSLT. Most significantly, they all let the developer mix code in the middle of HTML tags: With JSP, the code is written in the Java language; with PHP, the code is an ad-hoc scripting language; and with XSLT, the code is XML tags in the XSLT namespace.
This similarity hides a fundamental difference. For JSP (as well as PHP, ASP, and ColdFusion), the HTML tags are treated as text. Indeed when the JSP page is compiled in a servlet, all the HTML tags are moved to write statements. In essence, the mixing of tags and code is just a convenience for the code -- which means you don't have to create a lot of write statements.
Not so with XSLT. An XSLT processor treats tags as first-class citizens. The "T" in XSLT stands for Transformation. Transformation of what? Transformation of XML documents into other XML documents (HTML is treated as a variation of XML) or, to be more precise, transformations of trees into other trees. Trees? Think W3C DOM (the org.w3c.dom package in Java technology). Although modern XSLT processors don't use DOM internally for performance reasons (an optimized library is more efficient), it helps to think of XSLT as a language that converts a DOM tree into another DOM tree.
Unlike JSP or PHP, an XSLT processor does not blindly write the tags in the output. Instead the XSLT processor does the following:
- Loads the input document as a DOM tree (internally the processor optimizes DOM, but it does not matter for this discussion)
- Performs a depth-first walk of the input tree, this is the same depth-first algorithm you learned in programming 101
- As it walks through the document, selects a template in the stylesheet for the current node
- Applies the template, which describes how to create zero, one, or more nodes in the output tree
- When the walk is completed, creates a new tree (the output tree) from the input tree and the rules in the templates
- Writes the output tree according to the HTML or XML syntax
Note that it is possible to select other algorithms besides a depth-first walk. Still, the point I'm trying to make is that the XSLT processor treats its input and its output as trees. That treatment has three important consequences:
- The processor may change the syntax. Depending on the value of the
xsl:outputstatement, the processor may write the result according to XML or HTML syntax. Web development languages cannot do so because they treat HTML tags as text and blindly copy them into the output. - Although it may fail occasionally, the processor works hard to guarantee that the output is a well-formed XML document.
- You, the developer, have to express your problem in terms of tree manipulation.
I'll show you what that means in the next section.
In this section, I will compare two stylesheets. The first one is a typical XSLT stylesheet; the second is a rewriting that exposes the depth-first walk. While you would not want to adopt this coding style, it helps explain how the processor works.
Listing 1 is a sample XML document while Figure 1 is the corresponding DOM tree. Listing 2 is a simple stylesheet that converts Listing 1 to HTML.
Listing 1. An XML document
<?xml version="1.0"?>
<db:article xmlns:db="http://ananas.org/2002/docbook/subset">
<db:title>XSLT, JSP and PHP</db:title>
<db:section>
<db:title>Is there a difference?</db:title>
<db:para>Yes there is! XSLT is a pure XML technology that
traces its roots to <db:emphasis>tree manipulation
algorithms</db:emphasis>. JSP and PHP offer an ingenious
solution to combine scripting languages with HTML/XML
tagging.</db:para>
<db:para>The difference may not be obvious when you're first
learning XSLT (after all, it offers tags and instructions),
but understanding the difference will make you a
<db:emphasis role="bold">stronger and better</db:emphasis>
developer.</db:para>
</db:section>
<db:section>
<db:title>How do I learn the difference?</db:title>
<db:para>Interestingly enough, you can code the XSLT algorithm
in XSLT... one cool way to experiment with the
difference.</db:para>
</db:section>
</db:article> |
Listing 2. A simple stylesheet for HTML publishing
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:db="http://ananas.org/2002/docbook/subset">
<xsl:output method="html"/>
<xsl:template match="db:article">
<html>
<head><title>
<xsl:value-of select="db:articleinfo/db:title"/>
</title></head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="db:para">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="db:ulink">
<a href="{@url}"><xsl:apply-templates/></a>
</xsl:template>
<xsl:template match="db:article/db:title">
<h1><xsl:apply-templates/></h1>
</xsl:template>
<xsl:template match="db:title">
<h2><xsl:apply-templates/></h2>
</xsl:template>
<xsl:template match="db:emphasis[@role='bold']">
<b><xsl:apply-templates/></b>
</xsl:template>
<xsl:template match="db:emphasis">
<i><xsl:apply-templates/></i>
</xsl:template>
</xsl:stylesheet> |
Figure 1. The XML document as the processor sees it
The goal in this section is to rewrite Listing 2 to make the depth-first walk more visible. You'll need named templates for that. If you are not familiar with named templates, they are the equivalent of method calls in XSLT: A named template is a template with a name attribute. It accepts parameters through the xsl:param instruction, as follows:
<xsl:template name="print"> <xsl:param name="message"/> <!-- template content goes here --> </xsl:template> |
The xsl:call-template instruction is used (instead of xsl:apply-templates) to call the template, as follows:
<xsl:call-template name="print">
<xsl:with-param name="message"
select="'See if it prints this message.'"/>
</xsl:call-template> |
Listing 3 is a rewrite of Listing 2 that makes the tree walking more explicit. Instead of relying on the processor to operate the walk, this stylesheet has a named template, main, that implements the tree walking. main is a recursive function -- it accepts a node set in the current argument and loops over the node set. The bulk of the template is a choose instruction that attempts to find the most appropriate rules for a given node. When processing a node, the template recursively calls itself to process the children of the node.
Listing 3. A stylesheet to expose the walk
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:db="http://ananas.org/2002/docbook/subset">
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="main">
<xsl:param name="nodes"/>
<xsl:for-each select="$nodes">
<xsl:choose>
<xsl:when test="self::db:article">
<html>
<head><title>
<xsl:value-of select="db:title"/>
</title></head>
<body>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</body>
</html>
</xsl:when>
<xsl:when test="self::db:para">
<p>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</p>
</xsl:when>
<xsl:when test="self::db:ulink">
<a href="{@url}">
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</a>
</xsl:when>
<xsl:when test="self::db:title[parent::db:article]">
<h1>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</h1>
</xsl:when>
<xsl:when test="self::db:title">
<h2>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</h2>
</xsl:when>
<xsl:when test="self::db:emphasis[@role='bold']">
<b>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</b>
</xsl:when>
<xsl:when test="self::db:emphasis">
<i>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</i>
</xsl:when>
<xsl:when test="self::text()">
<xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="main">
<xsl:with-param name="nodes" select="child::node()"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet> |
If you compare Listing 2 and Listing 3, you will find that they are structurally identical. In Listing 2, the giant choose instruction is implemented behind the scenes through templates. You don't write the choose explicitly, but this is exactly how the processor works. Compare the templates in Listing 2 with the test cases in Listing 3 and you will find a one-to-one match.
Even the last two tests in Listing 3 have direct match in Listing 2 through the default templates. Although, by virtue of being default templates, they need not be written explicitly in Listing 2, the processor has a template for text content and another "catch-all" template.
In Listing 2, the xsl:apply-templates instruction replaces the recursive call. In many respects, you can think of xsl:apply-templates as a recursive call into the stylesheet itself! It tells the processor to move to the children of the current node and try to find another template to apply. The loop and the test are very explicit in Listing 3, whereas in Listing 2 they are done implicitly by the processor. In Listing 2, the template takes an extra parameter current; for Listing 1, the parameter is implicit. xsl:apply-templates automatically changes the current node.
Last but not least is the template parameter. In Listing 3, the templates takes a parameter with the nodes to process. In Listing 2, the templates don't need a parameter because the processor manages the current node. The current node always points to the node to which the templates apply. The current node works like an implicit parameter.
In practice, nobody would write a stylesheet like the one in Listing 3. It is intended for pedagogical purposes only, but it helps to illustrate how the processor works behind the scenes. As you can see if you compare Listings 2 and 3, the processor takes care of much of the basic coding (such as looping and passing parameters) to implement a depth-first search. Keep this in mind when you're working on your next stylesheet and you may find that it changes how you code.
For example, instead of writing:
<xsl:template match="db:emphasis">
<xsl:choose>
<xsl:when test="@role='bold'">
<b><xsl:apply-templates/></b>
</xsl:when>
<xsl:otherwise><i><xsl:apply-templates/></i></xsl:otherwise>
</xsl:choose>
</xsl:template> |
as XSLT newcomers frequently do, you can write the following code, which is strictly equivalent if you factor in the work of the processor:
<xsl:template match="db:emphasis[@role='bold']"> <b><xsl:apply-templates/></b> </xsl:template> <xsl:template match="db:emphasis"> <i><xsl:apply-templates/></i> </xsl:template> |
I hope I have clarified the inner workings of the XSLT processor. A good understanding of this is essential to improving your stylesheet coding.
- Participate in the discussion forum for Benoît Marchal's "Working XML" column.
- Find out more about XSLT and how it compares to other languages in
"What
kind of language is XSLT?" by Michael Kay (developerWorks, February
2001) .
- Read "Recurse,
not divide, to conquer" (developerWorks, July 2001) for
Benoît Marchal's discussion on how to adapt XSLT recursion to address
special needs.
- Explore "Mapping
files into SOAP requests, Part 2" (developerWorks,
January 2004) by Benoît Marchal for another example of using recursion
to transform special data with XSLT.
- Learn debugging techniques that offer
insight into how a stylesheet works with Uche Ogbuji's article
"Debug
XSLT on the fly" (developerWorks, November 2002).
- Find hundreds more XML resources on the developerWorks XML zone, including Benoît Marchal's "Working XML" column at the column summary page.
- Get IBM WebSphere Studio Application Developer, an application development product that supports the building of a large spectrum of applications using different technologies such as XML, JSPTM, servlets, HTML, Web services, databases, and EJBs.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.

Benoît Marchal is a Belgian consultant. He is the author of XML by Example, Second Edition and other XML books. You can contact him at bmarchal@pineapplesoft.com or through his personal site at marchal.com.




