As one of the contributors to Apache's open-source Xalan processor, I've been impressed by the wide range of applications folks are finding for XSLT. Stylesheets have established themselves as a very general-purpose tool, not just for rendering XML documents for display, but for automatically generating new documents.
Of course, this breadth of applications means folks keep coming up with things that XSLT can't quite do for them, and the Xalan team is often torn between the wish to stick with portable solutions and wanting to address those needs.
XSLT does have the concept of extensions (see Resources), which provide an architected way to enhance the stylesheet language. With the extensions, Xalan developers can provide some additional features without conflicting with the standard. But we really can't afford to build every requested extension directly into the processor; we'd wind up with a huge library of infrequently-used features.
Xalan does let you write and plug in your own extensions. But extensions are usually limited to defining new stylesheet operations rather than altering existing ones, and usually require that someone write code in a traditional programming language. (Future versions of XSLT may let you write extensions in the XSLT language.) Also, user-written extensions aren't supported by all XSLT processors, and the details of writing, accessing, and invoking them vary, so this isn't a very portable solution. (For example, an extension written for the Java-based Xalan-J processor can't be invoked from the C++ version, Xalan-C, nor vice versa -- see Resources for links to Xalan.)
In this pair of articles, I'll show you another way to enhance XSLT stylesheets, which can do some things extensions can't and which will work in any XSLT processor: write a stylesheet that compiles custom features into other stylesheets! Essentially, we can leverage the fact that an XSLT stylesheet is itself an XML document and automatically apply a set of modifications to add or modify its behavior.
Motivating use case: "Why did I get that output?"
Here's a real-world example: Stefan Kost, a Xalan-J user, submitted the following enhancement request:
What do you think about adding a attribute to xsl:output, such as
xalan:debug which would cause all invocations to emit a
comment into the resulting tree:
xml: <testtag/> xsl: ... <xsl:template match="testtag"> <h1>tralala</h1> </xsl:template> generated html: <!-- testtag:beg --> <h1>tralala</h1> <!-- testtag:end -->
The first question we asked, of course, was whether the user was sure he
needed this behavior. Xalan has some debugging features already built into
it which can tell you what a stylesheet is doing. For example, in the Java
version of Xalan (which I'm a bit more familiar with), you can write and
plug in a
TraceListener object which is told what Xalan is
doing as the stylesheet executes.
TraceListener is built into Xalan's command-line tool,
org.apache.xalan.xslt.Process, and is used to support a set
of useful command-line options:
-TT(Trace Templates) generates a message each time Xalan starts processing a new template, telling you the template's match pattern, its name (if it has been given one) and its mode (again, if one has been specified). This gives you a basic view of the stylesheet's flow of execution. The messages are written out to the screen using the same format as
<xsl:message>, and include information about which stylesheet file this template was found in and where it is within that file, so you can call the stylesheet up in an editor and see exactly which template it was and what it was trying to do. (I should note here that this location information isn't always available, depending on where Xalan loaded the stylesheet from.)
-TTC(Trace Template Children) produces a message for every step of processing within the template, allowing you to see the stylesheet's execution in much more detail. As with -TT, this comes with file/line/column information.
-TS(Trace Selections) produces a message each time a stylesheet directive performs a select operation to retrieve data from the source document. It tells you where the retrieval occurred in the stylesheet (file/line/column), what kind of retrieval was being performed, what the select XPath was, and lists the source-document nodes that were returned by that search. The list of matching nodes is normally rather terse, including just the node name and the node's internal handle. You can improve it by adding the
-Loption to also track location information for the source document, though that data isn't always available and using
-Ldoes consume additional memory.
-TG(Trace Generation) produces a message each time a stylesheet generates (writes) content to the output document. Think of this as a summary of Xalan's output SAX stream.
Obviously, these trace options can be combined if you like, or you can
write your own
TraceListener to produce more sophisticated
logging -- potentially even using the trace events to drive a stylesheet
debugger complete with animated execution, breakpoints, and performance
But Stefan wasn't entirely happy with any of those solutions. Writing a
TraceListener takes a bit of work, and obviously isn't
portable to other XSLT processors. The Xalan-J trace options also aren't
portable, their output tends to be a bit verbose, and it's hard to see
from the messages exactly what part of the output was produced where.
Stefan really wanted to have the trace information incorporated into the
stylesheet's output, so he could examine it directly in the context of the
Unfortunately, his proposed solution of adding a nonstandard directive to
<xsl:output> wasn't portable either, and would have
complicated the Xalan code a bit. Generally, adding features to the
processor's basic behaviors is not desirable unless they're going to be
fairly intensively used; every time you have to decide whether or not to
do something, it costs some performance, increases the risk of bugs, and
(of course) demands a bit more of the user's memory for the additional
Stefan might simply rewrite his stylesheet so it generates comments at the start and end of each template's execution. But I have to agree with him that doing so is a hassle, especially if you want to add the trace information only when you're trying to debug a problem. What Stefan wants is a way to automatically add this behavior to each template when he needs it.
Hmm. Automatically add behavior to each template element -- that sounds like something a stylesheet could do! And if you write that stylesheet, you can control exactly where those annotations go and what goes into them, rather than being stuck with someone else's guesses about what information would be useful.
The basics of styling stylesheets
I'll start by creating a sample stylesheet to apply the tracer to. Here's a simple version based on Stefan's illustration:
Listing 1. Sample stylesheet
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- For simple examples, you usually want most of the document passed through unchanged. This is the standard "identity" transformation for doing that. --> <xsl:template match="@*|node()" priority="-1"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- Stefan wants to replace <testtag/> elements with a fixed header... not very interesting, but OK as an example. --> <xsl:template match="testtag"> <h1>tralala</h1> </xsl:template> </xsl:stylesheet>
He wants a comment to be inserted into the output document every time an
<xsl:template> starts and ends. To do that, you want to
find each template and make the appropriate change.
Finding templates is easy; you just write a template that matches
templates. You can use
<xsl:copy> to copy the template
elements themselves, along with any namespace declarations on them.
Listing 2. Matching template elements
<xsl:template match="xsl:template"> <xsl:copy> ... </xsl:copy> </xsl:template>
Inserting the comment generators into the template bodies is a bit more
complicated, since XSLT has specific rules about the order in which things
can be written to the output. First, we need to explicitly copy the
template's attributes, which must be set before any children can
be added to the element. Next, because XSLT says that any
<xsl:param> elements in the altered template
must precede all other children (except whitespace), you need
to take care of those -- otherwise your comment generator might be
inserted before an
<xsl:param> and break the
stylesheet. Only then are you ready to alter the rest of the body
(everything that isn't an
<xsl:param>) to have it
automagically add the comments.
Listing 3. Copying existing template content
<xsl:template match="xsl:template"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates select="xsl:param"/> ... <xsl:apply-templates select="node()[not(name()='xsl:param')]"/> ... </xsl:copy> </xsl:template>
Now you need to fill in those remaining gaps with code to generate the
appropriate comments into the output document, plus some whitespace
wrapped around the comments for the sake of readability. You want
something roughly like the following, in order to display the value of the
match= attribute as part of the comment's text.
(My version is a bit wordier than Stefan's suggestion because I want to be
able to see at a glance which comments have been generated by this
Listing 4. Unsuccessful first attempt to generate trace comments
<!-- THIS WON'T WORK AS SHOWN. SEE BELOW! --> <xsl:text> </xsl:text> <xsl:comment> <xsl:text>[TraceXsl Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </xsl:comment> <xsl:text> </xsl:text>
And similarly for the end-of-template comment.
But as I indicate above, this won't work as written. If you tried it, you'd
find that the
<xsl:comment> were interpreted while you were styling
the stylesheet. But you want some of those instructions to be written to
the generated stylesheet, and executed only when it runs!
Luckily, XSLT's designers anticipated this problem, and gave two solutions.
One is to explicitly build the desired output elements using the
directives (see Resources). This is a very
powerful mechanism that allows you to construct just about any document
structure you need, and can use the full power of XPath and XSLT to set
not only the contents but the names and namespaces. However it is somewhat
verbose, even if you take advantage of the fact that
<xsl:element> assumes the correct namespace based on
the context in which the element name was specified:
Listing 5. Generating a trace comment, with assist from <xsl:element>
<!-- THIS ONE WORKS! --> <xsl:element name="xsl:text" xml:space="preserve"> </xsl:element> <xsl:element name="xsl:comment"> <xsl:text>[TraceXSL Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </xsl:element> <xsl:element name="xsl:text" xml:space="preserve"> </xsl:element>
The other solution is to use the
mechanism (see Resources). With a namespace
alias, you can use one namespace for literal result elements (and
attributes) inside a stylesheet, and have it converted to another
namespace when the generated document is written out. To use this feature,
define a temporary namespace at the top of your
stylesheet-for-stylesheets, and tell XSLT that when it sees that namespace
it should actually output the standard XSL namespace instead.
Listing 6. Using a namespace alias
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tracexsl="http://www.ibm.com/xsl-example/tracexsl"> <xsl:namespace-alias stylesheet-prefix="tracexsl" result-prefix="xsl"/> ....
You can then rewrite your comment generator in a form that is successfully written out to the generated stylesheet:
Listing 7. Generating a trace comment, with assist from <xsl:namespace-alias>
<!-- THIS ONE WORKS TOO! --> <tracexsl:text xml:space="preserve"> </tracexsl:text> <tracexsl:comment> <xsl:text>[Tracexsl Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </tracexsl:comment> <tracexsl:text xml:space="preserve"> </tracexsl:text>
In this pair of articles, I use the
since I think it's a bit easier for humans to read. (And as you'll see in
Part 2, I have another good use for that
namespace.) But either approach will work.
Note that in both of these solutions, I had to tell the processor to
preserve the whitespace in some of the generated
elements. XSLT normally assumes that whitespace in a stylesheet isn't
meaningful, with a few specific exceptions.
happens to be one of those exceptions -- but at this point in the
stylesheet, the XSLT processor doesn't know that either
going to turn into an
<xsl:text>, and would discard the
newline if I didn't explicitly tell it not to.
Putting it all together, here's what you get.
tracexsl1.xsl-- The new stylesheet for tracing stylesheets.
tracexsl-sample1.xsl-- The runnable version of Stefan's sample stylesheet, as shown in Listing 1.
- When I use Xalan to run
tracexsl1.xslover Stefan's stylesheet, I get
Note that in the generated stylesheet, the
tracexsl:prefix is bound to the same namespace as
xsl:, and these elements will be executed when the stylesheet runs. Other XSLT processors may prefer to change the prefix, or define it at a higher level rather than on each generated tag, but the meaning will be the same.
This is a slightly ugly stylesheet -- but remember, you didn't have to write it by hand (and you can re-generate it rather than having to maintain it), so that isn't a serious problem.
The real question is, does it do what you want it to?
tracexsl-sample1.xml-- Stefan's sample source document, as shown above.
- Finally, I can use Xalan to run the generated
tracexsl-sample1.xsl.withTraceover that source document, and it produces
tracexsl-sample1.xml.traceResult-- Hey, it works!
For comparison, I've also written a version using
tracexsl1a.xsl-- Trace using
- Run that over Stefan's sample stylesheet.
tracexsl-sample1a.xsl.withTrace-- styled stylesheet.
Note that this version produces a slightly cleaner annotated stylesheet -- with fewer namespace declarations -- but the downside is that you have to work a bit harder to produce it, and you can't immediately see which directives have been added by your trace generator.
- Run that over Stefan's sample input file...
- And you get
tracexsl-sample1a.xml.traceResult-- which once again shows the results Stefan asked for!
OK, I've illustrated the concept... but it isn't really a usable tool yet.
One concern is that the generated comments are less informative than they
might be. If the stylesheet used
modes, or invoked templates
name, we might not be able to tell which of several
templates was actually executed. And it doesn't tell what portion of the
source document this template was running against. You know roughly what
happened... but not exactly what, or why.
A second worry is the generation of a lot of comments in any nontrivial stylesheet execution -- perhaps so many that it would be hard to find the ones you're really interested in.
And there's a more serious problem: The generated annotations may break some stylesheets. For example, Stefan apparently wanted line breaks before and after each comment. That's OK for most HTML processing, where extra whitespace is generally discarded during the browser's formatting process -- but it isn't so good for XML, where whitespace may be meaningful. And even the comments inserted before and after the output of each template may break things in some cases, such as when the output of a template appears in a context where comments aren't permitted (for example, inside a comment) or is concatenated with other text to produce a single word in the output document.
But one advantage of writing the annotation tool as a stylesheet rather than relying on something built into the XSLT processor is that you can tweak it to suit your needs. In the second half of this article, you will:
- Improve the trace messages to tell you much more about which templates are running, why they're running, and what they're processing
- Add selective tracing (controllable from the command line!) so you can just trace the portions of the stylesheet you're interested in
- And, as a bonus, I'll show you how to generate a good approximation of an XPath to a given node.
Watch this space!
- For general advice on using the XSLT stylesheet language, one of the best places to look is the XSL User's mailing list, at http://www.mulberrytech.com/xsl/xsl-list/index.html. The mailing list's home page also has a link to Dave Pawson's XSLT Frequently Asked Questions (FAQ) Web site, which collects many of the most useful answers.
- For information about the open-source Xalan XSLT processor, which I used to develop and test the examples in this article, see Apache's Web site at http://xml.apache.org. There, you'll find specifics about the Java-based Xalan-J as well as the C++ version, Xalan-C. The best places to ask questions about using Xalan would be the Xalan-J and Xalan-C users' mailing lists; you can find out about them at http://xml.apache.org/mail.html. If you want to get involved in Xalan's development, try the Xalan-Dev mailing list, found at the same place.
- For the official definition of XSLT, including the concepts mentioned
in this article -- extensions,
<xsl:namespace-alias>-- check out the XSLT Recommendation on the W3C's website.
- The W3C also has an XSL Home Page, which has links not only to the Recommendation but to related information such as the XPath and XSL-FO specifications, lists of software that supports these standards, pointers to interesting articles about XSL and other resources (including most of the ones I've mentioned here), as well as the Working Drafts (WDs) which describe proposed future versions of these tools. Yes, W3C specifications are often a bit hard to read -- both because they were written by experts for experts, and because many expert programmers really don't write very well -- but if you need the official word on exactly what XSLT should be doing in any particular case, this is where you'll find it.
- For an interesting example of using XSLT as a code compiler, take a look at the DOM Test Suite now being developed, http://www.w3.org/DOM/Test/. Because the DOM is available in multiple languages and bindings, the developers chose to write the test suite using an abstract XML-based meta-language, and to use XSLT stylesheets to turn that into executable code. A test case written once in the meta-language can be compiled and executed in any language they have a stylesheet for, ensuring that the same tests are applied everywhere. (I've experimented with some similar code generation myself, but in my case I had the stylesheet produce BML -- IBM's Bean Markup Language -- and then let the BML tools do the work of turning it into executing Java code.)
- And, of course, don't forget to check right here on IBM's developerWorks XML zone for a wide variety of articles, tutorials, tips and tools. You'll also find a number of interesting XML tools on alphaWorks, where you can download experimental versions of some of IBM's very latest ideas.