Style stylesheets to extend XSLT, Part 1

Use XSLT as a macro preprocessor

XSLT isn't just about styling documents for presentation. It's actually a very general-purpose document transformation processor. And as Joe demonstrates in this two-part series, stylesheets are themselves documents, so XSLT can be used as a portable preprocessor to automatically enhance the behavior of a stylesheet.

Joseph Kesselman (keshlam@us.ibm.com), Advisory Scientist, IBM, Software Group

Joe Kesselman has been with IBM for over two decades, working on projects ranging from mainframe circuit design, to CAD tools, to research in software development, to Internet standards (he's one of the authors of the W3C's DOM Level 2 Recommendation). Most recently he's been working on XSLT processors, including Apache's Xalan. You can contact Joe at keshlam@us.ibm.com.



06 May 2003

As one of the contributors to Apache's open-source Xalan processor, I've been impressed by the wide range of applications folks are finding for XSLT. Stylesheets have established themselves as a very general-purpose tool, not just for rendering XML documents for display, but for automatically generating new documents.

Of course, this breadth of applications means folks keep coming up with things that XSLT can't quite do for them, and the Xalan team is often torn between the wish to stick with portable solutions and wanting to address those needs.

XSLT does have the concept of extensions (see Resources), which provide an architected way to enhance the stylesheet language. With the extensions, Xalan developers can provide some additional features without conflicting with the standard. But we really can't afford to build every requested extension directly into the processor; we'd wind up with a huge library of infrequently-used features.

Xalan does let you write and plug in your own extensions. But extensions are usually limited to defining new stylesheet operations rather than altering existing ones, and usually require that someone write code in a traditional programming language. (Future versions of XSLT may let you write extensions in the XSLT language.) Also, user-written extensions aren't supported by all XSLT processors, and the details of writing, accessing, and invoking them vary, so this isn't a very portable solution. (For example, an extension written for the Java-based Xalan-J processor can't be invoked from the C++ version, Xalan-C, nor vice versa -- see Resources for links to Xalan.)

In this pair of articles, I'll show you another way to enhance XSLT stylesheets, which can do some things extensions can't and which will work in any XSLT processor: write a stylesheet that compiles custom features into other stylesheets! Essentially, we can leverage the fact that an XSLT stylesheet is itself an XML document and automatically apply a set of modifications to add or modify its behavior.

Motivating use case: "Why did I get that output?"

Here's a real-world example: Stefan Kost, a Xalan-J user, submitted the following enhancement request:

What do you think about adding a attribute to xsl:output, such as xalan:debug which would cause all invocations to emit a comment into the resulting tree:

		xml:
		    <testtag/>
		xsl:
		    ...
		    <xsl:template match="testtag">
		      <h1>tralala</h1>
		    </xsl:template>
		generated html:
		    <!-- testtag:beg -->
		    <h1>tralala</h1>
		    <!-- testtag:end -->

The first question we asked, of course, was whether the user was sure he needed this behavior. Xalan has some debugging features already built into it which can tell you what a stylesheet is doing. For example, in the Java version of Xalan (which I'm a bit more familiar with), you can write and plug in a TraceListener object which is told what Xalan is doing as the stylesheet executes.

A basic TraceListener is built into Xalan's command-line tool, org.apache.xalan.xslt.Process, and is used to support a set of useful command-line options:

  • -TT (Trace Templates) generates a message each time Xalan starts processing a new template, telling you the template's match pattern, its name (if it has been given one) and its mode (again, if one has been specified). This gives you a basic view of the stylesheet's flow of execution. The messages are written out to the screen using the same format as <xsl:message>, and include information about which stylesheet file this template was found in and where it is within that file, so you can call the stylesheet up in an editor and see exactly which template it was and what it was trying to do. (I should note here that this location information isn't always available, depending on where Xalan loaded the stylesheet from.)
  • -TTC (Trace Template Children) produces a message for every step of processing within the template, allowing you to see the stylesheet's execution in much more detail. As with -TT, this comes with file/line/column information.
  • -TS (Trace Selections) produces a message each time a stylesheet directive performs a select operation to retrieve data from the source document. It tells you where the retrieval occurred in the stylesheet (file/line/column), what kind of retrieval was being performed, what the select XPath was, and lists the source-document nodes that were returned by that search. The list of matching nodes is normally rather terse, including just the node name and the node's internal handle. You can improve it by adding the -L option to also track location information for the source document, though that data isn't always available and using -L does consume additional memory.
  • -TG (Trace Generation) produces a message each time a stylesheet generates (writes) content to the output document. Think of this as a summary of Xalan's output SAX stream.

Obviously, these trace options can be combined if you like, or you can write your own TraceListener to produce more sophisticated logging -- potentially even using the trace events to drive a stylesheet debugger complete with animated execution, breakpoints, and performance analysis.

But Stefan wasn't entirely happy with any of those solutions. Writing a good TraceListener takes a bit of work, and obviously isn't portable to other XSLT processors. The Xalan-J trace options also aren't portable, their output tends to be a bit verbose, and it's hard to see from the messages exactly what part of the output was produced where. Stefan really wanted to have the trace information incorporated into the stylesheet's output, so he could examine it directly in the context of the generated document.

Unfortunately, his proposed solution of adding a nonstandard directive to <xsl:output> wasn't portable either, and would have complicated the Xalan code a bit. Generally, adding features to the processor's basic behaviors is not desirable unless they're going to be fairly intensively used; every time you have to decide whether or not to do something, it costs some performance, increases the risk of bugs, and (of course) demands a bit more of the user's memory for the additional code.

Stefan might simply rewrite his stylesheet so it generates comments at the start and end of each template's execution. But I have to agree with him that doing so is a hassle, especially if you want to add the trace information only when you're trying to debug a problem. What Stefan wants is a way to automatically add this behavior to each template when he needs it.

Hmm. Automatically add behavior to each template element -- that sounds like something a stylesheet could do! And if you write that stylesheet, you can control exactly where those annotations go and what goes into them, rather than being stuck with someone else's guesses about what information would be useful.


The basics of styling stylesheets

I'll start by creating a sample stylesheet to apply the tracer to. Here's a simple version based on Stefan's illustration:

Listing 1. Sample stylesheet
                <?xml version="1.0"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

	<!-- For simple examples, you usually want most of the document
	     passed through unchanged. This is the standard "identity"
	     transformation for doing that. -->
	  <xsl:template match="@*|node()" priority="-1">
	    <xsl:copy>
	      <xsl:apply-templates select="@*|node()"/>
	    </xsl:copy>
	  </xsl:template>

	<!-- Stefan wants to replace <testtag/> elements with a fixed header...
	     not very interesting, but OK as an example. -->
	    <xsl:template match="testtag">
		<h1>tralala</h1>
	    </xsl:template>

</xsl:stylesheet>

He wants a comment to be inserted into the output document every time an <xsl:template> starts and ends. To do that, you want to find each template and make the appropriate change.

Finding templates is easy; you just write a template that matches templates. You can use <xsl:copy> to copy the template elements themselves, along with any namespace declarations on them.

Listing 2. Matching template elements
                  <xsl:template match="xsl:template">
    <xsl:copy>
	...
    </xsl:copy>
  </xsl:template>

Inserting the comment generators into the template bodies is a bit more complicated, since XSLT has specific rules about the order in which things can be written to the output. First, we need to explicitly copy the template's attributes, which must be set before any children can be added to the element. Next, because XSLT says that any <xsl:param> elements in the altered template must precede all other children (except whitespace), you need to take care of those -- otherwise your comment generator might be inserted before an <xsl:param> and break the stylesheet. Only then are you ready to alter the rest of the body (everything that isn't an <xsl:param>) to have it automagically add the comments.

Listing 3. Copying existing template content
                  <xsl:template match="xsl:template">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="xsl:param"/>
	...
      <xsl:apply-templates select="node()[not(name()='xsl:param')]"/>
	...
    </xsl:copy>
  </xsl:template>

Now you need to fill in those remaining gaps with code to generate the appropriate comments into the output document, plus some whitespace wrapped around the comments for the sake of readability. You want something roughly like the following, in order to display the value of the template's match= attribute as part of the comment's text. (My version is a bit wordier than Stefan's suggestion because I want to be able to see at a glance which comments have been generated by this process.)

Listing 4. Unsuccessful first attempt to generate trace comments
                      <!-- THIS WON'T WORK AS SHOWN.  SEE BELOW! -->
      <xsl:text>
</xsl:text>
      <xsl:comment>
         <xsl:text>[TraceXsl Begin] match="</xsl:text>
         <xsl:value-of select="@match"/>
         <xsl:text>"</xsl:text>
      </xsl:comment>
      <xsl:text>
</xsl:text>

And similarly for the end-of-template comment.

But as I indicate above, this won't work as written. If you tried it, you'd find that the <xsl:text> and <xsl:comment> were interpreted while you were styling the stylesheet. But you want some of those instructions to be written to the generated stylesheet, and executed only when it runs!

Luckily, XSLT's designers anticipated this problem, and gave two solutions.

One is to explicitly build the desired output elements using the <xsl:element> and <xsl:attribute> directives (see Resources). This is a very powerful mechanism that allows you to construct just about any document structure you need, and can use the full power of XPath and XSLT to set not only the contents but the names and namespaces. However it is somewhat verbose, even if you take advantage of the fact that <xsl:element> assumes the correct namespace based on the context in which the element name was specified:

Listing 5. Generating a trace comment, with assist from <xsl:element>
                      <!-- THIS ONE WORKS! -->
      <xsl:element name="xsl:text" xml:space="preserve">
</xsl:element>
      <xsl:element name="xsl:comment">
         <xsl:text>[TraceXSL Begin] match="</xsl:text>
         <xsl:value-of select="@match"/>
         <xsl:text>"</xsl:text>
      </xsl:element>
      <xsl:element name="xsl:text" xml:space="preserve">
</xsl:element>

The other solution is to use the <xsl:namespace-alias> mechanism (see Resources). With a namespace alias, you can use one namespace for literal result elements (and attributes) inside a stylesheet, and have it converted to another namespace when the generated document is written out. To use this feature, define a temporary namespace at the top of your stylesheet-for-stylesheets, and tell XSLT that when it sees that namespace it should actually output the standard XSL namespace instead.

Listing 6. Using a namespace alias
                    <xsl:stylesheet  version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:tracexsl="http://www.ibm.com/xsl-example/tracexsl">
  
    <xsl:namespace-alias stylesheet-prefix="tracexsl" result-prefix="xsl"/>
    ....

You can then rewrite your comment generator in a form that is successfully written out to the generated stylesheet:

Listing 7. Generating a trace comment, with assist from <xsl:namespace-alias>
                      <!-- THIS ONE WORKS TOO! -->
      <tracexsl:text xml:space="preserve">
</tracexsl:text>
      <tracexsl:comment>
         <xsl:text>[Tracexsl Begin] match="</xsl:text>
         <xsl:value-of select="@match"/>
         <xsl:text>"</xsl:text>
      </tracexsl:comment>
      <tracexsl:text xml:space="preserve">
</tracexsl:text>

In this pair of articles, I use the namespace-alias solution, since I think it's a bit easier for humans to read. (And as you'll see in Part 2, I have another good use for that tracexsl: namespace.) But either approach will work.

Note that in both of these solutions, I had to tell the processor to preserve the whitespace in some of the generated text elements. XSLT normally assumes that whitespace in a stylesheet isn't meaningful, with a few specific exceptions. <xsl:text> happens to be one of those exceptions -- but at this point in the stylesheet, the XSLT processor doesn't know that either <xsl:element> or <tracexsl:text> is going to turn into an <xsl:text>, and would discard the newline if I didn't explicitly tell it not to.

Putting it all together, here's what you get.

  1. tracexsl1.xsl -- The new stylesheet for tracing stylesheets.
  2. tracexsl-sample1.xsl -- The runnable version of Stefan's sample stylesheet, as shown in Listing 1.
  3. When I use Xalan to run tracexsl1.xsl over Stefan's stylesheet, I get tracexsl-sample1.xsl.withTrace.

    Note that in the generated stylesheet, the tracexsl: prefix is bound to the same namespace as xsl:, and these elements will be executed when the stylesheet runs. Other XSLT processors may prefer to change the prefix, or define it at a higher level rather than on each generated tag, but the meaning will be the same.

    This is a slightly ugly stylesheet -- but remember, you didn't have to write it by hand (and you can re-generate it rather than having to maintain it), so that isn't a serious problem.

    The real question is, does it do what you want it to?

  4. tracexsl-sample1.xml -- Stefan's sample source document, as shown above.
  5. Finally, I can use Xalan to run the generated tracexsl-sample1.xsl.withTrace over that source document, and it produces tracexsl-sample1.xml.traceResult -- Hey, it works!

For comparison, I've also written a version using <xsl:element>:

  1. tracexsl1a.xsl -- Trace using <xsl:element>.
  2. Run that over Stefan's sample stylesheet.
  3. tracexsl-sample1a.xsl.withTrace -- styled stylesheet.

    Note that this version produces a slightly cleaner annotated stylesheet -- with fewer namespace declarations -- but the downside is that you have to work a bit harder to produce it, and you can't immediately see which directives have been added by your trace generator.

  4. Run that over Stefan's sample input file...
  5. And you get tracexsl-sample1a.xml.traceResult -- which once again shows the results Stefan asked for!

Going further

OK, I've illustrated the concept... but it isn't really a usable tool yet.

One concern is that the generated comments are less informative than they might be. If the stylesheet used modes, or invoked templates by name, we might not be able to tell which of several templates was actually executed. And it doesn't tell what portion of the source document this template was running against. You know roughly what happened... but not exactly what, or why.

A second worry is the generation of a lot of comments in any nontrivial stylesheet execution -- perhaps so many that it would be hard to find the ones you're really interested in.

And there's a more serious problem: The generated annotations may break some stylesheets. For example, Stefan apparently wanted line breaks before and after each comment. That's OK for most HTML processing, where extra whitespace is generally discarded during the browser's formatting process -- but it isn't so good for XML, where whitespace may be meaningful. And even the comments inserted before and after the output of each template may break things in some cases, such as when the output of a template appears in a context where comments aren't permitted (for example, inside a comment) or is concatenated with other text to produce a single word in the output document.

But one advantage of writing the annotation tool as a stylesheet rather than relying on something built into the XSLT processor is that you can tweak it to suit your needs. In the second half of this article, you will:

  • Improve the trace messages to tell you much more about which templates are running, why they're running, and what they're processing
  • Add selective tracing (controllable from the command line!) so you can just trace the portions of the stylesheet you're interested in
  • And, as a bonus, I'll show you how to generate a good approximation of an XPath to a given node.

Watch this space!


Download

DescriptionNameSize
Code samplesx-styless1/x-styless1.zip5KB

Resources

  • For general advice on using the XSLT stylesheet language, one of the best places to look is the XSL User's mailing list, at http://www.mulberrytech.com/xsl/xsl-list/index.html. The mailing list's home page also has a link to Dave Pawson's XSLT Frequently Asked Questions (FAQ) Web site, which collects many of the most useful answers.
  • For information about the open-source Xalan XSLT processor, which I used to develop and test the examples in this article, see Apache's Web site at http://xml.apache.org. There, you'll find specifics about the Java-based Xalan-J as well as the C++ version, Xalan-C. The best places to ask questions about using Xalan would be the Xalan-J and Xalan-C users' mailing lists; you can find out about them at http://xml.apache.org/mail.html. If you want to get involved in Xalan's development, try the Xalan-Dev mailing list, found at the same place.
  • For the official definition of XSLT, including the concepts mentioned in this article -- extensions, <xsl:element>, and <xsl:attribute>, and <xsl:namespace-alias> -- check out the XSLT Recommendation on the W3C's website.
  • The W3C also has an XSL Home Page, which has links not only to the Recommendation but to related information such as the XPath and XSL-FO specifications, lists of software that supports these standards, pointers to interesting articles about XSL and other resources (including most of the ones I've mentioned here), as well as the Working Drafts (WDs) which describe proposed future versions of these tools. Yes, W3C specifications are often a bit hard to read -- both because they were written by experts for experts, and because many expert programmers really don't write very well -- but if you need the official word on exactly what XSLT should be doing in any particular case, this is where you'll find it.
  • For an interesting example of using XSLT as a code compiler, take a look at the DOM Test Suite now being developed, http://www.w3.org/DOM/Test/. Because the DOM is available in multiple languages and bindings, the developers chose to write the test suite using an abstract XML-based meta-language, and to use XSLT stylesheets to turn that into executable code. A test case written once in the meta-language can be compiled and executed in any language they have a stylesheet for, ensuring that the same tests are applied everywhere. (I've experimented with some similar code generation myself, but in my case I had the stylesheet produce BML -- IBM's Bean Markup Language -- and then let the BML tools do the work of turning it into executing Java code.)
  • And, of course, don't forget to check right here on IBM's developerWorks XML zone for a wide variety of articles, tutorials, tips and tools. You'll also find a number of interesting XML tools on alphaWorks, where you can download experimental versions of some of IBM's very latest ideas.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12263
ArticleTitle=Style stylesheets to extend XSLT, Part 1
publish-date=05062003