Since it was introduced in November 1999, I have found that XSLT, the XSL Transformations language, is one of the most useful (if not the most useful) tools you can use to manipulate XML documents. Many available APIs and tools work with XML documents from Java or other languages, and I have used many of them in different projects, but cannot recall an XML project that did not use at least some XSLT.
It should come as no surprise, then, that I have followed the development of XSLT 2.0 with great interest. XSLT is a powerful language, sophisticated enough to handle even the most complex manipulation, but it is also very verbose and that makes it more difficult to debug and maintain large stylesheets. The W3C hopes to address this, and other problems, when it releases two languages: XSLT 2.0 and XQuery 1.0. This article compares the two upcoming languages and provides some pointers on how best to use them.
As of this writing, XSLT 2.0 and XQuery 1.0 are at Candidate Release status, which is W3C jargon for a standard soon to be adopted. How soon? It depends on the review process.
By design, XSLT 2.0 and XQuery 1.0 have a lot in common. Both languages are based on the same foundation: XPath 2.0. Both languages are intended to manipulate XML documents. Both languages borrow from the script concept of using interpreted language for simple tasks. In practice, you could use either language to achieve a given result. One is not more powerful than the other. And yet, each language has a distinct personality. I suspect that, depending on the task at hand, and maybe depending on your personality, you might be more at ease with one or the other, so it is worth learning about them.
The remainder of this article sheds some light on the commonality and uniqueness of each language so you can choose the one that fits both your style and the tasks you are working on.
Let's start at the beginning: XPath 2.0. XSLT always has been two languages packaged as one: the XPath language and the XSLT itself. XPath is the language you use to query elements, for example, in select attributes, or to specify template matches (the match attribute in XSLT). The XSLT language proper is the language used to specify the results of the transformation with instructions like xsl:apply-templates, xsl:value-of, and xsl:for-each.
Consider the XSLT template in Listing 1. It mixes some XPath language (the match and select attributes), elements from XSLT itself (the template and xsl:value-of instruction) and elements from the resulting language (html and head):
Listing 1. XSLT template
<xsl:template match="rss">
<html>
<head><title><xsl:value-of select="channel/title"/></title></head>
<xsl:apply-templates/>
</html>
</xsl:template> |
The separation of task between XPath and XSLT is not new but, since no other languages were based on XPath, it was mostly an academic distinction -- until the arrival of XQuery.
XPath 2.0 is a serious upgrade
XPath 2.0 is a very significant upgrade. XPath 1.0 could query nodes from XML documents only. XPath 2.0 works on sequences, and sequences can contain nodes (so you're not loosing anything) as well as strings, integers, and other atomic values. The difference is subtle, but fundamental: it is now possible to further refine the result of a query through a second query. Previously, in XSLT 1.0, chaining two queries required a proprietary function.
Another significant change is that XPath 2.0 supports loops and variables. In other words, you can declare a variable in XPath and assign it a value from the XPath. Likewise for loops, you can run a loop in the XPath to compute the return value, for example, multiplying quantity by value when processing invoice lines. You can still declare variables in XSLT, and the looping in XSLT has been enhanced as well. That's a lot of power packed into the upgrade!
The new options simplify XSLT stylesheets greatly. In XSLT 1.0, you must write many algorithms, even simple ones, recursively. For example, something as common as computing the total of an invoice might require a recursive algorithm. This is no longer true with XSLT 2.0, thanks largely to the new features in XPath 2.0.
XSLT stylesheets are used for just about any XML manipulation. Personally, I have used XSLT for Web site publishing, to compile reports, compute statistics, pre-process XML files, convert between different vocabularies or software, prepare data for import in databases, process database export, and even to respond to Web service requests.
In practice, that's stretching the language. Sure it works, but you sometimes end up fighting against the language. The resulting stylesheets have many recursive functions and are difficult to debug.
The problem is that, as the name implies, XSLT was designed to transform documents for publishing and not much else. XSLT was not designed as a generic querying language, even though it has often been used as one for lack of a better alternative. XQuery, in contrast, is designed to query documents and prepare reports.
Admittedly, transform and query are generic words that cover many applications. As already discussed, the two languages are equally capable because they are both based on XPath 2.0. But they emphasize different usage patterns, which makes them better suited for different tasks.
The designers of XSLT made the assumption that you were interested in processing most of the document, which is reasonable when publishing. Therefore XSLT has a built-in tree walking engine that, by default, processes the entire document. They also assumed you worked mostly with textual information, so XSLT is not a strongly typed language and, finally, they assume that you would generate documents in a markup language so XSLT is written as an XML vocabulary.
The designers of XQuery made different assumptions. They assumed that when running a query you would want to zoom into a few sections of the document, so they built the language around XPath queries. XQuery has no default behavior; you're in charge.
They also assumed that you worked with typed data, such as a database extract, and XQuery is a strongly typed language. Finally, the syntax is decisively not XML.
To illustrate the differences between XSLT 2.0 and XQuery 1.0, let's review an example.
To compare two languages, one typically would write the same algorithm in the two languages, but in this particular case, it would not be fair. Because each language is at its best doing tasks where the other underperforms, one example would give a biased view. So I decided to prepare two examples: a transformation example for XSLT and a query for XQuery. By comparing them you will see why the languages are suitable for different needs.
Both examples run on Listing 2, which is a list of articles in RSS.
Listing 2. List of articles
<?xml version="1.0"?>
<rss version="2.0">
<channel>
<title>Projects in the "Working XML" column</title>
<description>A selection of ben's article on
developerWorks.</description>
<language>en</language>
<link>http://www.ibm.com/developerWorks/xml</link>
<item>
<title>Safe coding practices</title>
<pubDate>Fri, 6 May 2005 00:00:00 GMT</pubDate>
<link>http://www.ibm.com/developerworks/xml/library/x-wxxm30.html</link>
<description>Most common pitfalls and how to avoid them (4 parts).</description>
</item>
<item>
<title>The Eclipse task list</title>
<pubDate>Fri, 22 Oct 2004 00:00:00 GMT</pubDate>
<link>http://www.ibm.com/developerworks/library/x-wxxm27/</link>
<description>Various techniques on integrating XML and Eclipse (4 parts).</description>
</item>
<item>
<title>XML, XMI and code generation</title>
<pubDate>Wed, 31 Mar 2004 00:00:00 GMT</pubDate>
<link>http://www.ibm.com/developerworks/xml/library/x-wxxm23/</link>
<description>Using modeling for XML development (4 parts).</description>
</item>
</channel>
</rss> |
Listing 3 is the XSLT stylesheet. It publishes the RSS document in HTML. The stylesheet contains a template (xsl:template) for every element in Listing 2 and it relies on the XSLT processor to select the most appropriate template. You can see that the stylesheet assumes you'll process the entire document. XSLT would also work well with mixed elements.
Listing 3. XSLT stylesheet
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html"/>
<xsl:template match="rss">
<html>
<head><title>
<xsl:value-of select="channel/title"/>
</title></head>
<xsl:apply-templates/>
</html>
</xsl:template>
<xsl:template match="channel/title">
<h1><a href="{../link}"><xsl:apply-templates/></a></h1>
</xsl:template>
<xsl:template match="title">
<h2><a href="{../link}"><xsl:apply-templates/></a></h2>
</xsl:template>
<xsl:template match="description">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="pubDate | link | language"/>
</xsl:stylesheet> |
Note the declarative approach: you specify what to do with the elements and the processor (through its built-in tree walking logic) decides when to apply a template. It's easy to add and remove templates. The code contains no loops because the processor already knows how to walk through the document.
Listing 4 is the XQuery example. It runs two queries: one to extract the list of articles published on a Friday and the second one to list the articles that discuss XMI.
Listing 4. XQuery
<result>
{
for $i in doc("rss.xml")/rss/channel/item
where starts-with($i/pubDate/text(),"Fri")
return
<friday>
{ $i/title/text() }
</friday>
}
{
for $i in doc("rss.xml")/rss/channel/item
where contains($i/title/text(),"XMI")
return
<xmi>
{ $i/title/text() }
</xmi>
}
</result> |
Compare Listing 4 and Listing 3. In XQuery, the code relies on XPaths to point directly to the interesting elements and loops must be written explicitly. This is ideal to extract reports from the document.
I'm beginning to use XSLT 2.0 and XQuery 1.0 as different dialects of XPath 2.0. Each dialect is optimized for certain applications. Only time will tell whether both these dialects will grow and prosper. For the time being, I plan to concentrate my efforts on XPath 2.0 and use the most suitable dialect depending on the application at hand.
Learn
- What kind of language is XSLT? by Michael Kay: Put XSLT in context: where the language comes from, what it's good at, and why you should use it (developerWorks, April 2005).
- Influences on the design of XQuery: Read about the emergence of the XQuery language -- the need for a query language for XML data, and the basic principles behind it -- in this article by XQuery pioneer Don Chamberlin (developerWorks, September 2003).
- XML for Data: An early look at XQuery by Kevin Williams: Learn how to use the FLWR ("flower") clauses at the heart of XQuery (developerWorks, February 2002).
- How an XSLT processor works: Join Benoît Marchal and examine the tree walking logic that makes XSLT unique in this article (developerWorks, March 2004).
- developerWorks XML zone: Expand your XML skills with articles and tutorials.
- IBM XML certification: Find out how you can become an IBM Certified Developer in XML and related technologies.
Get products and technologies
- Saxon: Combine XSLT and XQuery processing in Saxon.
Discuss
- Participate in the discussion forum.
- Working XML: Participate in the Working XML discussion forum.

Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example, Applied XML Solutions, and XML and the Enterprise. Details on his latest projects are at marchal.com.
Comments (Undergoing maintenance)





