Write safer XSLT stylesheets

Add automated error trapping in XML transformations

It's simple to introduce mistakes in XSLT stylesheets that go unnoticed. Neither static nor dynamic error checking helps: Only thorough functional testing will reveal them. XSLT 2.0 introduced several new options and possibilities, and you can use some of them to make your stylesheets safer and your testing easier. Discover these XSLT 2.0 features, such as the type system, to add otherwise-impossible error checking to your stylesheets.

Share:

Erik Siegel (erik@xatapult.nl), XML Specialist, Xatapult

Photo of Erik SiegelErik Siegel works as a self-employed XML specialist based in The Netherlands. His career has involved many jobs—researcher, programmer, systems analyst, project manager, systems architect, consultant. During the past five years, XML crept in. His main customers are in publishing, and his XML activities include consulting, training, schema development, and XSLT programming. You can find more information about Erik and his company at www.xatapult.com.



25 October 2011

Also available in Chinese Japanese

Frequently used acronyms

  • DTD: Document type definition
  • IDE: Integrated development environment
  • XSLT: Extensible Stylesheet Language Transformations

If you have written any XSLT stylesheets, you probably know that making them safe in a real-world environment isn't always easy. For instance, tiny typing mistakes can cause major headaches. Simple errors in the names of elements and attributes in XPath expressions go undetected by the error-checking mechanisms of your XSLT processor. Something as simple as writing /Filename where it should be /FileName shows up only if your own test and debug efforts find it: It won't be trapped by the XSLT engine.

Here's an example. Assume that you have an XML document that looks like Listing 1.

Listing 1. Example XML document
<Things>
    <Thing thingid="12345FFD3">...</Thing>
    <Thing thingid="86779EAD0">...</Thing>
    ...
</Things>

In your XSLT stylesheet, you have a named template that does something with a thing (Listing 2).

Listing 2. Named template for processing a thing
<xsl:template name="ProcessThing">
    <xsl:param name="ThingId"/>
    <xsl:for-each select="/Things/Thing[@id eq $ThingId]">
        <!-- Do something with the thing -->
    </xsl:for-each>
</xsl:template>

Can you spot the mistake? The author of this template forgot (or didn't know) that the identifier attribute on a thing was spelled thingid, not id. If you run this markup, no error message will appear: The for-each loop simply won't execute. Maybe you notice, maybe you don't. After all, the error was embedded somewhere in a complicated transformation. So, you don't notice, your code went into production, and the spelling mistake prevented an important part of some calculation from occurring.

And there's another, related problem. Despite all intentions to the contrary, not all of your input documents will be validated against a schema or DTD. What if the document's author wrote <Filename> instead of the expected <FileName>?

It's a relaxing idea to trap these and other errors automatically, at least the most blatant ones, and have the XSLT processor notify you if something bad happens. This article explores how to make your XSLT stylesheets safer and have the XSLT processor trap errors that it typically simply ignores. It looks at stylesheets from a software engineering point of view and shows how to make them more robust and secure. This article assumes that you have at least a working knowledge of XSLT.

The XSLT v2 type system

The working environment

Most examples in this article rely on the features of XSLT version 2 and will not work for XSLT version 1. I tested all of it using Saxon Home Edition version 9.x, but everything stays within the XSLT v2 standard and should also work on other processors.

To understand most of the methods described in this article, you must understand the XSLT v2 type system. (This is a cursory introduction only: For more information, see Resources.) XSLT v2 introduced many new features, including data types. Advanced options include referencing schemas and using the types defined there. However, the methods described use only the basic type system.

One new feature of XSLT v2 is the ability to provide an explicit data type for your variables and parameters. For this, the type system from XML Schemas was used and augmented with a bit of extra syntax and semantics. You can provide a data type by using the as attribute—for instance:

<xsl:variable name="TestVariable" as="xs:integer" select="123"/>

Other basic data types include xs:string, xs:boolean, xs:double, xs:date, and xs:dateTime. And remember to define the xs namespace prefix somewhere (preferably on the root element of your stylesheet):

xmlns:xs="http://www.w3.org/2001/XMLSchema"

In XSLT v2, you can also store tree fragments (documents, nodes, attributes, and so on) in variables. With a bit of extra syntax, you can data type your variables for this. Listing 3 provides some examples.

Listing 3. Examples of variables holding tree fragments
<xsl:variable name="TheCompleteDocument" as="document-node()" select="/"/>
<xsl:variable name="AnyElement" as="element()" select="*[1]"/>
<xsl:variable name="FirstThingElement" as="element(Thing)" select="/Things/Thing[1]"/>
<xsl:variable name="FirstAttribute" as="attribute()" select="@*[1]"/>
<xsl:variable name="IdAttribute" as="attribute()" select="@id"/>

This is great, but it gets even better: All variables in XSLT v2 are sequences—that is, ordered sets of values—that can have zero, one, or more values. So, a "normal" variable (as in the examples above) is just a special case—a sequence with only one value in it. To use the power of sequences, add a plus sign (+) for one or more values, an asterisk (*) for zero or more values, or a question mark (?) for zero or one value to the end of the type specification. Yes, just like in good old DTDs. Listing 4 offers a few examples.

Listing 4. Examples of sequences
<xsl:variable name="MultipleStrings" as="xs:string+" select="('This', 'That')"/>
<xsl:variable name="EmptySetOfIntegers" as="xs:integer?" select="()"/>
<xsl:variable name="SetOfAllThings" as="element(Thing)*" select="//Thing"/>

There is a lot of power in this: All the usual XPath constructions are possible on your variables. Listing 5 provides an example.

Listing 5. XPath constructions using XSLT v2 variables
<xsl:for-each select="$SetOfAllThings">
    <!-- Do something with the things -->
</xsl:for-each>
<xsl:variable name="LastString" as="xs:string" select="$MultipleStrings[last()]"/>
<xsl:variable name="ThingCount" as="xs:integer" select="count($SetOfAllThings)"/>
<xsl:variable name="AllThingIdAttributes" as="attribute(thingid)*" 
              select="$SetOfAllThings/@thingid"/>

So, how does this help you make your stylesheets more robust? The nice thing with a type system is that at run time, the XSLT processor actually checks the value of a variable against its type and complains if it doesn't fit, including its multiplicity. So, if you put important values from your input document in variables with a proper type, you will get errors if something is wrong.


Trapping mistyped and unexpected elements and attributes

What can you do to guard against typos in XPath element and attribute names (both in your stylesheet and in your input documents)? Double-check, reread, practice ferocious debugging, keep your fingers crossed, pray...certainly, do it all. But what you really want is the bug to show up in a booming error message.

Using variables

If you rewrite the example from Listing 1 as shown in Listing 6, the error pops up immediately.

Listing 6. XML example from Listing 1 with variables
<xsl:template name="ProcessThing">
    <xsl:param name="ThingId"/>
    <xsl:variable name="ThingToProcess" as="element(Thing)"
        select="/Things/Thing[@id eq $ThingId]"/>
    <xsl:for-each select="$ThingToProcess">
        <!-- Do something with the thing -->
    </xsl:for-each>
</xsl:template>

At run time, the XSLT processor notices that /Things/Thing[@id eq $ThingId] does not return the expected <Thing> element but rather an empty sequence. The type definition for the ThingsToProcess variable, however, is element(Thing), which means exactly one Thing element. This element doesn't fit, so an error message pops up and the transformation process stops.

There is even a bonus here. If your input document contains an error and has two things with the same ID, you'll get an error for that, as well. The $ThingToProcess variable can contain only a single thing.

To make this template even safer, I data type the parameter as follows:

<xsl:param name="ThingId" as="xs:string" required="yes"/>

Doing so traps forgetting the parameter (because of the required="yes" element) and passing empty or multiple values.

You can use this technique in many ways. For instance, I often put important global values in attributes on my root element. To make these values globally available in my code, I store them in top-level variables before using them:

<xsl:variable name="GlobalId" as="xs:string" select="/*/@id"/>
<xsl:variable name="GlobalName" as="xs:string" select="/*/@name"/>

Giving variables a data type, even with something as generic as xs:string, traps the unexpected absence of the attribute—something that has saved my day more often than I would like to admit. So, put important values from your input document in variables first. Provide a data type for these variables, both with the right type and with the right multiplicity.

Use a catch-all

If you have a stylesheet that consists mainly of match templates, consider adding a catch-all template, even when you don't seem to need it.

Assume the following scenario: You wrote match templates for all the elements in the input and use <xsl:apply-templates> to propagate control. What happens when you mistype a name or someone changes your input document? The default template for elements kicks in and performs a silent <xsl:apply-templates>. Maybe that's what you want, maybe not. It is better to get an error so you can investigate what happened. A simple catch-all template such as the one in Listing 7 accomplishes this.

Listing 7. Simple catch-all template
<xsl:template match="*">
    <xsl:message terminate="yes">
        Unexpected element: <xsl:value-of select="name()"/>
    </xsl:message>
</xsl:template>

Safer named templates

Another source of problems is named templates and their parameters. Providing a data type parameter with the as attribute already traps many errors. There's more, though.

Avoid parameter errors

In XSLT v2, you can't pass a parameter to a named template that is not defined in its parameter list. This limitation is actually one of the few incompatibilities with XSLT v1, but it's a good one because it stops typing mistakes in parameter names.

You might not know about a tip to make your named templates even safer is a built-in XSLT v2 feature: You can flag parameters as required:

<xsl:template name="DoSomething">
    <xsl:param name="Subject" as="xs:string" required="yes"/>
    <!-- ... -->
</xsl:template>

Now you have to supply a Subject parameter when calling DoSomething. This trick is especially useful with long parameter lists, where it's easy to forget the parameter.

So, in XSLT v2, given the DoSomething named template defined earlier, both of the following calls are illegal and will be trapped:

<xsl:call-template name="DoSomething">
    <xsl:with-param name="subject" select="'Safer stylesheets'"/>
</xsl:call-template>
<xsl:call-template name="DoSomething"/>

Check the context

Named templates are particularly useful for dividing your code into smaller chunks and avoiding code duplication. For instance, if you have more than one location in your code that handles a Thing element (in the same way), you might want to write a named template such as the one in Listing 8.

Listing 8. Handling a Thing as a context element
<xsl:template name="HandleThing">
    <!-- Current element must be a <Thing>! -->
    <!-- ... -->
</xsl:template>

Again, it might be nice to raise an error if the current element is not a Thing. You can do this step as in Listing 9.

Listing 9. Handling a Thing and checking the context
<xsl:template name="HandleThing">
    <xsl:param name="ThingToHandle" as="element(Thing)" select="."/>
    <xsl:for-each select="$ThingToHandle">
        <!-- Now the current element is a <Thing> or we get an error! -->
        <!-- ... -->
    </xsl:for-each>
</xsl:template>

You might declare ThingToHandle as a variable instead of a parameter. Using a parameter, however, gives you a bonus: You can now use HandleThing when the current element is not a Thing, as well. Just pass it the element it should work with in the ThingToHandle parameter (Listing 10).

Listing 10. Handling a Thing that is not the current context
<xsl:template match="/">
    <!-- Only handle the first thing: -->
    <xsl:call-template name="HandleThing"> 
        <xsl:with-param name="ThingToHandle" select="/*/Thing[1]"/>
    </xsl:call-template>
</xsl:template>

Tips and tricks

Here are two final tips for creating safer XSLT stylesheets: asserts and the normalize-space() function.

Performing asserts

Most programming languages have something called an assert—a statement that stops the processing when a certain condition is met, such as an unexpected value in an important variable. XSLT lacks asserts but does have two ways to stop the XSLT processor from continuing: <xsl:message terminate="yes"> and the XPath error() function. Combine one of these with an <xsl:if> and you have created an excellent assert. For instance:

<xsl:if test="empty(/*/Thing)">
    <xsl:message terminate="yes">No things found in input document</xsl:message>
</xsl:if>

Or, using the error() function:

<xsl:if test="empty(/*/Thing)">
    <xsl:value-of select="error((), 'No things found in input document')"/>
</xsl:if>

Before deciding which one to use, test how they behave in your IDE and your runtime environment. With my system setup, the error() function provides the most useful result: the error message. An <xsl:message> tells me that an error has occurred on a particular line but does not show me the error message itself.

normalize-space() against pretty-printing

One last tip for making your stylesheets safer is to use the normalize-space() function lavishly. Why? Because the pretty-printing feature of XML editors can introduce unwanted line feeds and (more importantly) white space. For instance, assume that somewhere in the XML input document is an element like this, deeply nested:

<DeeplyNestedElement>This is an example of a pretty print error</DeeplyNestedElement>

Now, an ignorant author clicks the pretty-print button and suddenly your element looks like this:

<DeeplyNestedElement>This is an example of a pretty
     print error</DeeplyNestedElement>

Now, there is a line feed and a whole bunch of spaces in between pretty and print. No problem if you produce HTML, but that approach is not pretty at all if your code relies on the exact content of the element. Using normalize-space() sets this right: It removes leading and trailing white space and turns all other sequences of white space into a single space.

You must be careful: normalize-space() can also remove necessary white space and line feeds. Input that relies on the exact white space in text, however, is, at least in my world, rare.


Conclusion

The world is not perfect, and—especially when programming—you do not expect it to be. This article showed a number of ways to trap or prevent errors in XSLT processing that can go unnoticed otherwise. The type system in particular provides you with probably unintended but quite useful error-trapping possibilities.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=766952
ArticleTitle=Write safer XSLT stylesheets
publish-date=10252011