Write safer XSLT stylesheets
Add automated error trapping in XML transformations
If you have written any XSLT stylesheets, you probably know that making them safe in a real-world environment isn't always easy. For instance, tiny typing mistakes can cause major headaches. Simple errors in the names of elements and attributes in XPath expressions go undetected by the error-checking mechanisms of your XSLT processor. Something as simple as writing
where it should be
shows up only if your own test and debug efforts find it: It won't be trapped by the XSLT engine.
Here's an example. Assume that you have an XML document that looks like Listing 1.
Listing 1. Example XML document
<Things> <Thing thingid="12345FFD3">...</Thing> <Thing thingid="86779EAD0">...</Thing> ... </Things>
In your XSLT stylesheet, you have a named template that does something with a thing (Listing 2).
Listing 2. Named template for processing a thing
<xsl:template name="ProcessThing"> <xsl:param name="ThingId"/> <xsl:for-each select="/Things/Thing[@id eq $ThingId]"> <!-- Do something with the thing --> </xsl:for-each> </xsl:template>
Can you spot the mistake? The author of this template forgot (or didn't know) that the identifier attribute on a thing was spelled
id. If you run this markup, no error message will appear: The
loop simply won't execute. Maybe you notice, maybe you don't. After all, the error was embedded somewhere in a complicated transformation. So, you don't notice, your code went into production, and the spelling mistake prevented an important part of some calculation from occurring.
And there's another, related problem. Despite all intentions to the contrary, not all of your input documents will be validated against a schema or DTD. What if the document's author wrote
instead of the expected
It's a relaxing idea to trap these and other errors automatically, at least the most blatant ones, and have the XSLT processor notify you if something bad happens. This article explores how to make your XSLT stylesheets safer and have the XSLT processor trap errors that it typically simply ignores. It looks at stylesheets from a software engineering point of view and shows how to make them more robust and secure. This article assumes that you have at least a working knowledge of XSLT.
The XSLT v2 type system
To understand most of the methods described in this article, you must understand the XSLT v2 type system. (This is a cursory introduction only: For more information, see Related topics.) XSLT v2 introduced many new features, including data types. Advanced options include referencing schemas and using the types defined there. However, the methods described use only the basic type system.
One new feature of XSLT v2 is the ability to provide an explicit data type for your variables and parameters. For this, the type system from XML Schemas was used and augmented with a bit of extra syntax and semantics. You can provide a data type by using the
<xsl:variable name="TestVariable" as="xs:integer" select="123"/>
Other basic data types include
xs:dateTime. And remember to define the
namespace prefix somewhere (preferably on the root element of your stylesheet):
In XSLT v2, you can also store tree fragments (documents, nodes, attributes, and so on) in variables. With a bit of extra syntax, you can data type your variables for this. Listing 3 provides some examples.
Listing 3. Examples of variables holding tree fragments
<xsl:variable name="TheCompleteDocument" as="document-node()" select="/"/> <xsl:variable name="AnyElement" as="element()" select="*"/> <xsl:variable name="FirstThingElement" as="element(Thing)" select="/Things/Thing"/> <xsl:variable name="FirstAttribute" as="attribute()" select="@*"/> <xsl:variable name="IdAttribute" as="attribute()" select="@id"/>
This is great, but it gets even better: All variables in XSLT v2 are
sequences—that is, ordered sets of values—that can have zero, one,
or more values. So, a "normal" variable (as in the examples above) is just a special case—a sequence with only one value in it. To use the power of sequences, add a plus sign (
+) for one or more values, an asterisk (
*) for zero or more values, or a question mark (
?) for zero or one value to the end of the type specification. Yes, just like in good old DTDs.
offers a few examples.
Listing 4. Examples of sequences
<xsl:variable name="MultipleStrings" as="xs:string+" select="('This', 'That')"/> <xsl:variable name="EmptySetOfIntegers" as="xs:integer?" select="()"/> <xsl:variable name="SetOfAllThings" as="element(Thing)*" select="//Thing"/>
There is a lot of power in this: All the usual XPath constructions are possible on your variables. Listing 5 provides an example.
Listing 5. XPath constructions using XSLT v2 variables
<xsl:for-each select="$SetOfAllThings"> <!-- Do something with the things --> </xsl:for-each> <xsl:variable name="LastString" as="xs:string" select="$MultipleStrings[last()]"/> <xsl:variable name="ThingCount" as="xs:integer" select="count($SetOfAllThings)"/> <xsl:variable name="AllThingIdAttributes" as="attribute(thingid)*" select="$SetOfAllThings/@thingid"/>
So, how does this help you make your stylesheets more robust? The nice thing with a type system is that at run time, the XSLT processor actually checks the value of a variable against its type and complains if it doesn't fit, including its multiplicity. So, if you put important values from your input document in variables with a proper type, you will get errors if something is wrong.
Trapping mistyped and unexpected elements and attributes
What can you do to guard against typos in XPath element and attribute names (both in your stylesheet and in your input documents)? Double-check, reread, practice ferocious debugging, keep your fingers crossed, pray...certainly, do it all. But what you really want is the bug to show up in a booming error message.
Listing 6. XML example from Listing 1 with variables
<xsl:template name="ProcessThing"> <xsl:param name="ThingId"/> <xsl:variable name="ThingToProcess" as="element(Thing)" select="/Things/Thing[@id eq $ThingId]"/> <xsl:for-each select="$ThingToProcess"> <!-- Do something with the thing --> </xsl:for-each> </xsl:template>
At run time, the XSLT processor notices that
/Things/Thing[@id eq $ThingId]
does not return the expected
element but rather an empty sequence. The type definition for the
variable, however, is
element(Thing), which means exactly one
element. This element doesn't fit, so an error message pops up and the transformation process stops.
There is even a bonus here. If your input document contains an error and has two things with the same ID, you'll get an error for that, as well. The
variable can contain only a
To make this template even safer, I data type the parameter as follows:
<xsl:param name="ThingId" as="xs:string" required="yes"/>
Doing so traps forgetting the parameter (because of the
element) and passing empty or multiple values.
You can use this technique in many ways. For instance, I often put important global values in attributes on my root element. To make these values globally available in my code, I store them in top-level variables before using them:
<xsl:variable name="GlobalId" as="xs:string" select="/*/@id"/> <xsl:variable name="GlobalName" as="xs:string" select="/*/@name"/>
Giving variables a data type, even with something as generic as
xs:string, traps the unexpected absence of the attribute—something that has saved my day more often than I would like to admit. So, put important values from your input document in variables first. Provide a data type for these variables, both with the right type and with the right multiplicity.
Use a catch-all
If you have a stylesheet that consists mainly of match templates, consider adding a catch-all template, even when you don't seem to need it.
Assume the following scenario: You wrote match templates for all the elements in the input and use
to propagate control. What happens when you mistype a name or someone changes your input document? The default template for elements kicks in and performs a silent
<xsl:apply-templates>. Maybe that's what you want,
maybe not. It is better to get an error so you can investigate what happened. A simple catch-all template such as the one in
Listing 7. Simple catch-all template
<xsl:template match="*"> <xsl:message terminate="yes"> Unexpected element: <xsl:value-of select="name()"/> </xsl:message> </xsl:template>
Safer named templates
Another source of problems is named templates and their parameters. Providing a data type parameter with the
attribute already traps many errors. There's more, though.
Avoid parameter errors
In XSLT v2, you can't pass a parameter to a named template that is not defined in its parameter list. This limitation is actually one of the few incompatibilities with XSLT v1, but it's a good one because it stops typing mistakes in parameter names.
You might not know about a tip to make your named templates even safer is a built-in XSLT v2 feature: You can flag parameters as required:
<xsl:template name="DoSomething"> <xsl:param name="Subject" as="xs:string" required="yes"/> <!-- ... --> </xsl:template>
parameter when calling
DoSomething. This trick is especially useful with long parameter lists, where it's easy to forget the parameter.
So, in XSLT v2, given the
named template defined earlier, both of the following calls are illegal and will be trapped:
<xsl:call-template name="DoSomething"> <xsl:with-param name="subject" select="'Safer stylesheets'"/> </xsl:call-template> <xsl:call-template name="DoSomething"/>
Check the context
Named templates are particularly useful for dividing your code into smaller chunks and avoiding code duplication. For instance, if you have more than one location in your code that handles a
element (in the same way), you might want to write a named template such as the one in
Listing 8. Handling a Thing as a context element
<xsl:template name="HandleThing"> <!-- Current element must be a <Thing>! --> <!-- ... --> </xsl:template>
Again, it might be nice to raise an error if the current element is not a
Thing. You can do this step as in
Listing 9. Handling a Thing and checking the context
<xsl:template name="HandleThing"> <xsl:param name="ThingToHandle" as="element(Thing)" select="."/> <xsl:for-each select="$ThingToHandle"> <!-- Now the current element is a <Thing> or we get an error! --> <!-- ... --> </xsl:for-each> </xsl:template>
You might declare
as a variable instead of a parameter. Using a parameter, however, gives you a bonus: You can now use
when the current element is not a
Thing, as well. Just pass it the element it should work with in the
parameter (Listing 10).
Listing 10. Handling a Thing that is not the current context
<xsl:template match="/"> <!-- Only handle the first thing: --> <xsl:call-template name="HandleThing"> <xsl:with-param name="ThingToHandle" select="/*/Thing"/> </xsl:call-template> </xsl:template>
Tips and tricks
Here are two final tips for creating safer XSLT stylesheets: asserts and the
Most programming languages have something called an
assert—a statement that stops the processing when a certain condition is met, such as an unexpected value in an important variable. XSLT lacks asserts but does have two ways to stop the XSLT processor from continuing:
and the XPath
function. Combine one of these with an
and you have created an excellent assert. For instance:
<xsl:if test="empty(/*/Thing)"> <xsl:message terminate="yes">No things found in input document</xsl:message> </xsl:if>
Or, using the
<xsl:if test="empty(/*/Thing)"> <xsl:value-of select="error((), 'No things found in input document')"/> </xsl:if>
Before deciding which one to use, test how they behave in your IDE and your runtime environment. With my system setup, the
function provides the most useful result: the error message. An
tells me that an error has occurred on a particular line but does not show me the error message itself.
normalize-space() against pretty-printing
One last tip for making your stylesheets safer is to use the
function lavishly. Why? Because the pretty-printing feature of XML editors can introduce unwanted line feeds and (more importantly) white space. For instance, assume that somewhere in the XML input document is an element like this, deeply nested:
<DeeplyNestedElement>This is an example of a pretty print error</DeeplyNestedElement>
Now, an ignorant author clicks the pretty-print button and suddenly your element looks like this:
<DeeplyNestedElement>This is an example of a pretty print error</DeeplyNestedElement>
Now, there is a line feed and a whole bunch of spaces in between pretty and print. No problem if you produce HTML, but that approach is not pretty at all if your code relies on the exact content of the element. Using
sets this right: It removes leading and trailing white space and turns all other sequences of white space into a single space.
You must be careful:
can also remove
white space and line feeds. Input that relies on the exact white space in text, however, is, at least in my world, rare.
The world is not perfect, and—especially when programming—you do not expect it to be. This article showed a number of ways to trap or prevent errors in XSLT processing that can go unnoticed otherwise. The type system in particular provides you with probably unintended but quite useful error-trapping possibilities.
- XSLT 2.0 and XPath 2.0, 4th edition (Michael Kay, Wrox, 2008): Read Chapter 5 for more information about the XSLT v2 type system.
- XSL Transformations (XSLT) Version 2.0 (W3C Recommendation, January 2007): Explore the syntax and semantics of XSLT 2.0, a language for transforming XML documents into other XML documents.
- XML area on developerWorks: Find the resources you need to advance your skills in the XML arena, including DTDs, schemas, and XSLT. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.