A major update to a familiar standard
Although still a Candidate Recommendation, XPath 2.0 is moving towards formal approval. The first update to the XPath recommendation since 1999 is eagerly anticipated by the market and, indeed, several tools have already begun to implement the latest drafts. The changes are so fundamental, I expect that in time the world will come to see XPath 1.0 as a draft for XPath 2.0.
The XPath 2.0 recommendation serves as the basis for XSLT 2.0 and XQuery 1.0. Both languages use XPath as their core querying engine and augment it with statements to format the results (see Resources).
The many changes between XPath 1.0 and XPath 2.0 include:
- A new data model, based on sequences instead of node sets
- The ability to bind variables; previously variables were bound in the host language (XSLT)
- Full support for XML Schema datatypes
- Many new functions, including regular expression, data/time, and string manipulations
- Comments; although not a major feature, they are handy when you debug queries: just comment out portions of the path for testing
In this article, I concentrate on the new data model and, more specifically, the use of sequences because it is the most fundamental change in terms of expressiveness.
Sequences in XPath 2.0
XPath 2.0 processes anything as a sequence. A sequence is an ordered heterogeneous collection of items. The items can be either nodes from an XML document or atomic values. Atomic values can be any type defined in the XML Schema recommendation, including complex types. To declare a sequence in an XPath, just separate the items with commas and enclose the whole sequence in parentheses:
(2, 'declencheur', 5.10)
In practice, almost every valid XPath 1.0 request remains valid in XPath 2.0.
In other words, XPath 2.0 retains the familiar XPath 1.0 syntax: a path still is made of
location steps separated by a forward slash (
/), such as:
However, the location steps in XPath 2.0 identify items in a sequence (again, those items might be XML nodes) instead of nodes in a tree (the XPath 1.0 data model).
Every concept in XPath 2.0 has been reworked around sequences. For example, functions that expected node sets in XPath 1.0 now work with sequences.
Given that XML documents are hierarchical, the XPath 1.0 model (a tree structure) is sensible. But it is also limiting because XPath cannot generate trees and, therefore, it is impossible to pass the result from a request to another request for further processing. Complex requests, à la SQL, are impossible to write.
As noted earlier, the XPath 1.0 syntax remains in use, but XPath 2.0 also
introduces several new statements specifically to work with sequences. I will first review the
for expression which, as the name implies, loops over
the items in a sequence.
for expression looks like Listing 1:
Listing 1. XPath 2.0 sample
for $line in /po:PurchaseOrder/po:OrderLines/po:Line return $line/po:Price * $line/po:Quantity
The preceding XPath would be executed against a purchase order like Listing 2. It computes the totals of each order line and returns the following sequence:
(29.99, 89.98, 80, 3.1)
Listing 2. Purchase order (sample XML document)
<?xml version="1.0" encoding="ISO-8859-1"?> <po:PurchaseOrder xmlns:po="http://www.marchal.com/2006/po"> <po:Buyer>Pineapplesoft<po:Buyer> <po:Seller>Bookstore<po:Seller> <po:OrderLines> <po:Line> <po:Code type="ISBN">0-7897-2504-5<po:Code> <po:Quantity>1<po:Quantity> <po:Description>XML by Example<po:Description> <po:Price>29.99<po:Price> </po:Line> <po:Line> <po:Code type="ISBN">0-672-32054-1</po:Code> <po:Quantity>2<po:Quantity> <po:Description>Applied XML Solutions<po:Description> <po:Price>44.99</po:Price> </po:Line> <po:Line> <po:Code type="ISBN">2-10-005763-4<po:Code> <po:Quantity>2<po:Quantity> <po:Description>Huit Solutions Concrètes avec XML et Java</po:Description> <po:Price>40.00<po:Price> <po:Line> <po:Line> <po:Quantity>1<po:Quantity> <po:Description>Internet Magazine<po:Description> <po:Price>3.10<po:Price> <po:Line> </po:OrderLines> <po:PurchaseOrder><
In Listing 1,
for is the keyword; it loops over a sequence of lines and binds each item (each line) to the variable,
To select the sequence, the path uses location steps like XPath 1.0 (
Next is the return portion of the expression. The return creates a sequence dynamically. Essentially, it adds zero, one, or more items to the output sequence for every item in the loop.
Returning sequences is essential because sequences can be further processed through
XPath. For example, it is trivial to compute the total of the purchase order
by passing the returned sequence to the
sum() is an XPath 1.0 function that has been extended
to work with sequences, as Listing 3 illustrates:
Listing 3. Processing the result of an XPath
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Line return $line/po:Price * $line/po:Quantity)
What it used to be like
Listing 4 is essentially the same algorithm as Listing 3, but implemented in XPath 1.0 and XSLT 1.0:
Listing 4. Computing the total with XPath 1.0
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:po="http://www.marchal.com/2006/po" xmlns:exslt="http://exslt.org/common" version="1.0"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:variable name="lines"> <xsl:for-each select="/po:PurchaseOrder/po:OrderLines/po:Line"> <line-total><xsl:value-of select="po:Price * po:Quantity"/><line-total> <xsl:for-each> </xsl:variable> <xsl:value-of select="sum(exslt:node-set($lines)/line-total)"/> <xsl:template> </xsl:stylesheet>
Listing 4 starts by computing the individual line totals and passes the result to the
However, in XPath 1.0, variables must be declared in the host language (XSLT in this case)
so a temporary result set is built into the variable
Next, the content of the variable
is fed to a second XPath that computes the purchase order total.
When comparing Listing 3 and Listing 4, the increased expressiveness of XPath 2.0 is obvious. Listing 4 has two XPath statements instead of one, and it relies on the host language (XSLT) to communicate intermediate results. Listing 4 is less readable and, by breaking the request over two XPath statements, it limits opportunities for query optimization.
XPath 2.0 also introduces a conditional expression (
if), shown in Listing 5.
The syntax is self-explanatory: depending on whether the expression in parenthesis evaluates to true or false,
the expression returns the
Listing 5. Conditional expression
if(/po:PurchaseOrder/po:Seller = 'Bookstore') then 'ok' else 'ko'
A discussion of sequences is not complete without quantified expressions.
In a nutshell, quantified expressions are tests that apply to a sequence as a whole.
The two quantified expressions are:
Listing 6 is an
every expression. It consists of two sections: first binding a variable to a sequence (just like a loop)
and then specifying a condition that items in the sequence must meet. The difference between a quantified expression
and a loop is that the conditional expression returns a Boolean value, whereas the loop returns a sequence.
every expression returns true if the condition is true for every item in the sequence;
some expression returns true if the conditional expression is true for at least one item in the sequence.
Listing 6. Quantified expressions
every $line in /po:PurchaseOrder/po:OrderLines/po:Line satisfies $line/po:Code
Running Listing 6 against the document in Listing 2 returns false because the fourth line does not have
a po:Code element. If you were to replace the
every keyword with
some, then the expression would return true
because at least one line has a po:Code element.
The power of XPath 2.0 comes from the ability to combine expressions
to create sophisticated requests. Listing 7 computes the purchase order total with a different formula: only those lines
that include a product code are counted; the other lines are silently ignored
(presumably it is not possible to ship those products).
The coding is simple because it suffices to add an
if expression that returns an
if the condition is not met.
Listing 7. Combining expressions
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Line return if($line/po:Code) then $line/po:Price * $line/po:Quantity else ())
In conclusion, XPath 2.0, thanks to its new data model based on sequences, greatly simplifies writing complex requests. Requests that previously required a lot of XSLT code, you can now write exclusively in XPath.
- What kind of language is XSLT? (developerWorks, February 2001 updated April 2005): Join author Michael Kay as he puts XSLT in context -- where the language comes from, what it's good at, and why you should use it.
- Influences on the design of XQuery (developerWorks, September 2003): Read what XQuery pioneer Don Chamberlin has to say on the emergence of the XQuery language, the need for a query language for XML data, and the basic principles behind it.
- An early look at XQuery (developerWorks, February 2002): Learn how to use the FLWR ("flower") clauses at the heart of XQuery with Kevin Williams.
- Comparing XSLT 2.0 and XQuery (developerWorks, April 2006): With author Benoît Marchal, compare the two host languages for XPath 2.0.
- An introduction to XQuery (developerWorks, June 2001 updated January 2006): Explore XQuery with Howard Katz, including background history, a road map into the documentation, and a snapshot of the current state of the specification.
- developerWorks XML zone: Expand your XML skills with articles and tutorials.
- IBM XML certification: Find out how you can become an IBM Certified Developer in XML and related technologies.
Get products and technologies
- Saxon: Get combined XSLT and XQuery processing in Saxon.