As one of the contributors to Apache's open-source Xalan processor, I've been impressed by the wide range of applications folks are finding for XSLT. Stylesheets have established themselves as a very general-purpose tool, not just for rendering XML documents for display, but for automatically generating new documents.
Of course, this breadth of applications means folks keep coming up with things that XSLT can't quite do for them, and the Xalan team is often torn between the wish to stick with portable solutions and wanting to address those needs.
XSLT does have the concept of extensions (see Resources), which provide an architected way to enhance the stylesheet language. With the extensions, Xalan developers can provide some additional features without conflicting with the standard. But we really can't afford to build every requested extension directly into the processor; we'd wind up with a huge library of infrequently-used features.
Xalan does let you write and plug in your own extensions. But extensions are usually limited to defining new stylesheet operations rather than altering existing ones, and usually require that someone write code in a traditional programming language. (Future versions of XSLT may let you write extensions in the XSLT language.) Also, user-written extensions aren't supported by all XSLT processors, and the details of writing, accessing, and invoking them vary, so this isn't a very portable solution. (For example, an extension written for the Java-based Xalan-J processor can't be invoked from the C++ version, Xalan-C, nor vice versa -- see Resources for links to Xalan.)
In this pair of articles, I'll show you another way to enhance XSLT stylesheets, which can do some things extensions can't and which will work in any XSLT processor: write a stylesheet that compiles custom features into other stylesheets! Essentially, we can leverage the fact that an XSLT stylesheet is itself an XML document and automatically apply a set of modifications to add or modify its behavior.
Here's a real-world example: Stefan Kost, a Xalan-J user, submitted the following enhancement request:
What do you think about adding a attribute to xsl:output, such as
xalan:debugwhich would cause all invocations to emit a comment into the resulting tree:
xml: <testtag/> xsl: ... <xsl:template match="testtag"> <h1>tralala</h1> </xsl:template> generated html: <!-- testtag:beg --> <h1>tralala</h1> <!-- testtag:end -->
The first question we asked, of course, was whether the user was sure he
needed this behavior. Xalan has some debugging features already built
into it which can tell you what a stylesheet is doing. For example, in the Java version
of Xalan (which I'm a bit more familiar with), you can write and plug in
TraceListener object which is told
what Xalan is doing as the stylesheet executes.
TraceListener is built into Xalan's command-line tool,
org.apache.xalan.xslt.Process, and is used to support a
set of useful command-line options:
-TT(Trace Templates) generates a message each time Xalan starts processing a new template, telling you the template's match pattern, its name (if it has been given one) and its mode (again, if one has been specified). This gives you a basic view of the stylesheet's flow of execution. The messages are written out to the screen using the same format as
<xsl:message>, and include information about which stylesheet file this template was found in and where it is within that file, so you can call the stylesheet up in an editor and see exactly which template it was and what it was trying to do. (I should note here that this location information isn't always available, depending on where Xalan loaded the stylesheet from.)
-TTC(Trace Template Children) produces a message for every step of processing within the template, allowing you to see the stylesheet's execution in much more detail. As with -TT, this comes with file/line/column information.
-TS(Trace Selections) produces a message each time a stylesheet directive performs a select operation to retrieve data from the source document. It tells you where the retrieval occurred in the stylesheet (file/line/column), what kind of retrieval was being performed, what the select XPath was, and lists the source-document nodes that were returned by that search. The list of matching nodes is normally rather terse, including just the node name and the node's internal handle. You can improve it by adding the
-Loption to also track location information for the source document, though that data isn't always available and using
-Ldoes consume additional memory.
-TG(Trace Generation) produces a message each time a stylesheet generates (writes) content to the output document. Think of this as a summary of Xalan's output SAX stream.
Obviously, these trace options can be combined if you like, or
you can write your own
to produce more sophisticated logging
-- potentially even using the trace events to drive a stylesheet
debugger complete with animated execution, breakpoints, and
But Stefan wasn't entirely happy with any of those solutions. Writing
TraceListener takes a bit of work, and obviously isn't portable
to other XSLT processors. The Xalan-J trace options also aren't
portable, their output tends to be a bit verbose, and it's hard to see
from the messages exactly what part of the output was produced
where. Stefan really wanted to have the trace information
incorporated into the stylesheet's output, so he could examine it
directly in the context of the generated document.
Unfortunately, his proposed solution of adding a nonstandard
<xsl:output> wasn't portable
either, and would have complicated the Xalan code a bit. Generally, adding features to the processor's basic
behaviors is not desirable unless they're going to be fairly intensively used; every
time you have to decide whether or not to do something, it costs some
performance, increases the risk of bugs, and (of course) demands a bit
more of the user's memory for the additional code.
Stefan might simply rewrite his stylesheet so it generates comments at the start and end of each template's execution. But I have to agree with him that doing so is a hassle, especially if you want to add the trace information only when you're trying to debug a problem. What Stefan wants is a way to automatically add this behavior to each template when he needs it.
Hmm. Automatically add behavior to each template element -- that sounds like something a stylesheet could do! And if you write that stylesheet, you can control exactly where those annotations go and what goes into them, rather than being stuck with someone else's guesses about what information would be useful.
I'll start by creating a sample stylesheet to apply the tracer to. Here's a simple version based on Stefan's illustration:
Listing 1. Sample stylesheet
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- For simple examples, you usually want most of the document passed through unchanged. This is the standard "identity" transformation for doing that. --> <xsl:template match="@*|node()" priority="-1"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- Stefan wants to replace <testtag/> elements with a fixed header... not very interesting, but OK as an example. --> <xsl:template match="testtag"> <h1>tralala</h1> </xsl:template> </xsl:stylesheet>
He wants a comment to be inserted into the output document every time an
<xsl:template> starts and ends. To do that, you want to
find each template and make the appropriate change.
Finding templates is easy; you just write a template that matches
templates. You can use
<xsl:copy> to copy the template elements
themselves, along with any namespace declarations on them.
Listing 2. Matching template elements
<xsl:template match="xsl:template"> <xsl:copy> ... </xsl:copy> </xsl:template>
Inserting the comment generators into the template bodies is a bit
more complicated, since XSLT has specific rules about the order in
which things can be written to the output. First, we need to
explicitly copy the template's attributes, which must
be set before any children can be added to the element. Next, because
XSLT says that any
<xsl:param> elements in the altered template
must precede all other children (except whitespace),
you need to take care of those -- otherwise your comment generator might
be inserted before an
<xsl:param> and break the
stylesheet. Only then are you ready to alter the rest of the body
(everything that isn't an
<xsl:param>) to have it
automagically add the comments.
Listing 3. Copying existing template content
<xsl:template match="xsl:template"> <xsl:copy> <xsl:apply-templates select="@*"/> <xsl:apply-templates select="xsl:param"/> ... <xsl:apply-templates select="node()[not(name()='xsl:param')]"/> ... </xsl:copy> </xsl:template>
Now you need to fill in those remaining gaps with code to generate the
appropriate comments into the output document, plus some whitespace
wrapped around the comments for the sake of readability. You want something
roughly like the following, in order to display the value of the template's
match= attribute as part of the comment's text. (My
version is a bit wordier than Stefan's suggestion because I want to be
able to see at a glance which comments have been generated by this
Listing 4. Unsuccessful first attempt to generate trace comments
<!-- THIS WON'T WORK AS SHOWN. SEE BELOW! --> <xsl:text> </xsl:text> <xsl:comment> <xsl:text>[TraceXsl Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </xsl:comment> <xsl:text> </xsl:text>
And similarly for the end-of-template comment.
But as I indicate above, this won't work as written. If you tried it,
you'd find that the
<xsl:comment> were interpreted while you were styling
the stylesheet. But you want some of those instructions to be written
to the generated stylesheet, and executed only when it runs!
Luckily, XSLT's designers anticipated this problem, and gave two solutions.
One is to explicitly build the desired output elements using the
directives (see Resources). This is a very powerful mechanism that allows you to
construct just about any document structure you need, and can use the
full power of XPath and XSLT to set not only the contents but the
names and namespaces. However it is somewhat verbose, even if you take
advantage of the fact that
<xsl:element> assumes the correct namespace based
on the context in which the element name was specified:
Listing 5. Generating a trace comment, with assist from <xsl:element>
<!-- THIS ONE WORKS! --> <xsl:element name="xsl:text" xml:space="preserve"> </xsl:element> <xsl:element name="xsl:comment"> <xsl:text>[TraceXSL Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </xsl:element> <xsl:element name="xsl:text" xml:space="preserve"> </xsl:element>
The other solution is to use the
mechanism (see Resources). With a namespace alias,
you can use one namespace for literal result elements (and
attributes) inside a stylesheet, and have it converted to another
namespace when the generated document is written out. To use this
feature, define a temporary namespace at the top of your
stylesheet-for-stylesheets, and tell XSLT that when it sees that
namespace it should actually output the standard XSL namespace instead.
Listing 6. Using a namespace alias
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tracexsl="http://www.ibm.com/xsl-example/tracexsl"> <xsl:namespace-alias stylesheet-prefix="tracexsl" result-prefix="xsl"/> ....
You can then rewrite your comment generator in a form that is successfully written out to the generated stylesheet:
Listing 7. Generating a trace comment, with assist from <xsl:namespace-alias>
<!-- THIS ONE WORKS TOO! --> <tracexsl:text xml:space="preserve"> </tracexsl:text> <tracexsl:comment> <xsl:text>[Tracexsl Begin] match="</xsl:text> <xsl:value-of select="@match"/> <xsl:text>"</xsl:text> </tracexsl:comment> <tracexsl:text xml:space="preserve"> </tracexsl:text>
In this pair of articles, I use the
namespace-alias solution, since I think it's a bit easier
for humans to read. (And as you'll see in Part 2, I have another good use
tracexsl: namespace.) But either approach will work.
Note that in both of these solutions, I had to tell the processor to preserve the
whitespace in some of the generated
text elements. XSLT
normally assumes that whitespace in a stylesheet isn't meaningful,
with a few specific exceptions.
<xsl:text> happens to
be one of those exceptions -- but at this point in the stylesheet, the
XSLT processor doesn't know that either
<tracexsl:text> is going to turn into an
<xsl:text>, and would discard the newline if I didn't
explicitly tell it not to.
Putting it all together, here's what you get.
tracexsl1.xsl-- The new stylesheet for tracing stylesheets.
tracexsl-sample1.xsl-- The runnable version of Stefan's sample stylesheet, as shown in Listing 1.
- When I use Xalan to run
tracexsl1.xslover Stefan's stylesheet, I get
Note that in the generated stylesheet, the
tracexsl:prefix is bound to the same namespace as
xsl:, and these elements will be executed when the stylesheet runs. Other XSLT processors may prefer to change the prefix, or define it at a higher level rather than on each generated tag, but the meaning will be the same.
This is a slightly ugly stylesheet -- but remember, you didn't have to write it by hand (and you can re-generate it rather than having to maintain it), so that isn't a serious problem.
The real question is, does it do what you want it to?
tracexsl-sample1.xml-- Stefan's sample source document, as shown above.
- Finally, I can use Xalan to run the generated
tracexsl-sample1.xsl.withTraceover that source document, and it produces
tracexsl-sample1.xml.traceResult-- Hey, it works!
For comparison, I've also written a version using
tracexsl1a.xsl-- Trace using
- Run that over Stefan's sample stylesheet.
tracexsl-sample1a.xsl.withTrace-- styled stylesheet.
Note that this version produces a slightly cleaner annotated stylesheet -- with fewer namespace declarations -- but the downside is that you have to work a bit harder to produce it, and you can't immediately see which directives have been added by your trace generator.
- Run that over Stefan's sample input file...
- And you get
tracexsl-sample1a.xml.traceResult-- which once again shows the results Stefan asked for!
OK, I've illustrated the concept... but it isn't really a usable tool yet.
One concern is that the generated comments are less informative
than they might be. If the stylesheet used
modes, or invoked
name, we might not be able to tell which of several
templates was actually executed. And it doesn't tell what portion
of the source document this template was running against.
You know roughly what happened... but not exactly what, or why.
A second worry is the generation of a lot of comments in any nontrivial stylesheet execution -- perhaps so many that it would be hard to find the ones you're really interested in.
And there's a more serious problem: The generated annotations may break some stylesheets. For example, Stefan apparently wanted line breaks before and after each comment. That's OK for most HTML processing, where extra whitespace is generally discarded during the browser's formatting process -- but it isn't so good for XML, where whitespace may be meaningful. And even the comments inserted before and after the output of each template may break things in some cases, such as when the output of a template appears in a context where comments aren't permitted (for example, inside a comment) or is concatenated with other text to produce a single word in the output document.
But one advantage of writing the annotation tool as a stylesheet rather than relying on something built into the XSLT processor is that you can tweak it to suit your needs. In the second half of this article, you will:
- Improve the trace messages to tell you much more about which templates are running, why they're running, and what they're processing
- Add selective tracing (controllable from the command line!) so you can just trace the portions of the stylesheet you're interested in
- And, as a bonus, I'll show you how to generate a good approximation of an XPath to a given node.
Watch this space!
- For general advice on using the XSLT stylesheet language, one of
the best places to look is the XSL User's mailing list, at http://www.mulberrytech.com/xsl/xsl-list/index.html. The mailing list's home page also has a link to Dave Pawson's XSLT Frequently Asked Questions (FAQ) Web site, which collects many of the most useful answers.
- For information about the open-source Xalan XSLT processor, which
I used to develop and test the examples in this article, see Apache's
Web site at http://xml.apache.org.
There, you'll find specifics about the Java-based Xalan-J
as well as the C++ version, Xalan-C.
The best places to ask questions about using Xalan would be
the Xalan-J and Xalan-C users' mailing lists; you can find out about
them at http://xml.apache.org/mail.html. If
you want to get involved in Xalan's development, try the Xalan-Dev
mailing list, found at the same place.
- For the official definition of XSLT, including the concepts mentioned
in this article -- extensions,
<xsl:namespace-alias>-- check out the XSLT Recommendation on the W3C's website.
- The W3C also has an XSL Home Page, which has links not only to the Recommendation but to related information such as the XPath and XSL-FO specifications,
lists of software that supports these standards, pointers to
interesting articles about XSL and other resources
(including most of the ones I've mentioned here), as well as
the Working Drafts (WDs) which describe proposed future versions of these
tools. Yes, W3C specifications are often a bit hard to read -- both
because they were written by experts for experts, and because many
expert programmers really don't write very well -- but if you need the
official word on exactly what XSLT should be doing in any particular
case, this is where you'll find it.
- For an interesting example of using XSLT as a code compiler, take a look at the DOM Test Suite now being developed,
the DOM is available in multiple languages and bindings, the developers chose to
write the test suite using an abstract XML-based meta-language, and
to use XSLT stylesheets to turn that into executable code. A test case
written once in the meta-language can be compiled and executed in any
language they have a stylesheet for, ensuring that the same tests are
applied everywhere. (I've experimented with some similar code
generation myself, but in my case I had the stylesheet produce BML --
IBM's Bean Markup Language -- and then let the BML tools do the work
of turning it into executing Java code.)
- And, of course, don't forget to check right here on IBM's developerWorks XML
zone for a wide variety of articles, tutorials, tips and
tools. You'll also find a number of interesting XML tools on alphaWorks, where you can
download experimental versions of some of IBM's very latest ideas.
Joe Kesselman has been with IBM for over two decades, working on projects ranging from mainframe circuit design, to CAD tools, to research in software development, to Internet standards (he's one of the authors of the W3C's DOM Level 2 Recommendation). Most recently he's been working on XSLT processors, including Apache's Xalan. You can contact Joe at email@example.com.