Working XML: Get started with XPath 2.0

Understand the new data model

XPath 2.0 is the foundation of two essential recommendations currently in the final stages of development at W3C: XSLT 2.0 and XQuery. It is a major rewrite designed to significantly increase the power and efficiency of the language. In this article, Benoît Marchal shows how the new data model enables you to easily write more sophisticated requests.

Share:

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft

Photo of Benoit MarchalBenoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example, Applied XML Solutions, and XML and the Enterprise. Details on his latest projects are at marchal.com.



30 May 2006

Also available in Chinese Russian Japanese

A major update to a familiar standard

Although still a Candidate Recommendation, XPath 2.0 is moving towards formal approval. The first update to the XPath recommendation since 1999 is eagerly anticipated by the market and, indeed, several tools have already begun to implement the latest drafts. The changes are so fundamental, I expect that in time the world will come to see XPath 1.0 as a draft for XPath 2.0.

The XPath 2.0 recommendation serves as the basis for XSLT 2.0 and XQuery 1.0. Both languages use XPath as their core querying engine and augment it with statements to format the results (see Resources).

The many changes between XPath 1.0 and XPath 2.0 include:

  • A new data model, based on sequences instead of node sets
  • The ability to bind variables; previously variables were bound in the host language (XSLT)
  • Full support for XML Schema datatypes
  • Many new functions, including regular expression, data/time, and string manipulations
  • Comments; although not a major feature, they are handy when you debug queries: just comment out portions of the path for testing

In this article, I concentrate on the new data model and, more specifically, the use of sequences because it is the most fundamental change in terms of expressiveness.


Sequences in XPath 2.0

XPath 2.0 processes anything as a sequence. A sequence is an ordered heterogeneous collection of items. The items can be either nodes from an XML document or atomic values. Atomic values can be any type defined in the XML Schema recommendation, including complex types. To declare a sequence in an XPath, just separate the items with commas and enclose the whole sequence in parentheses:

(2, 'declencheur', 5.10)

In practice, almost every valid XPath 1.0 request remains valid in XPath 2.0. In other words, XPath 2.0 retains the familiar XPath 1.0 syntax: a path still is made of location steps separated by a forward slash (/), such as:

/po:PurchaseOrder/po:ProductList/po:Name.

However, the location steps in XPath 2.0 identify items in a sequence (again, those items might be XML nodes) instead of nodes in a tree (the XPath 1.0 data model).

Every concept in XPath 2.0 has been reworked around sequences. For example, functions that expected node sets in XPath 1.0 now work with sequences.

Given that XML documents are hierarchical, the XPath 1.0 model (a tree structure) is sensible. But it is also limiting because XPath cannot generate trees and, therefore, it is impossible to pass the result from a request to another request for further processing. Complex requests, à la SQL, are impossible to write.

Using sequences

As noted earlier, the XPath 1.0 syntax remains in use, but XPath 2.0 also introduces several new statements specifically to work with sequences. I will first review the for expression which, as the name implies, loops over the items in a sequence.

A typical for expression looks like Listing 1:

Listing 1. XPath 2.0 sample
for $line in /po:PurchaseOrder/po:OrderLines/po:Line
   return $line/po:Price * $line/po:Quantity

The preceding XPath would be executed against a purchase order like Listing 2. It computes the totals of each order line and returns the following sequence:

(29.99, 89.98, 80, 3.1)

Listing 2. Purchase order (sample XML document)
<?xml version="1.0" encoding="ISO-8859-1"?>
<po:PurchaseOrder xmlns:po="http://www.marchal.com/2006/po">
   <po:Buyer>Pineapplesoft<po:Buyer>
   <po:Seller>Bookstore<po:Seller>
    <po:OrderLines>
      <po:Line>
         <po:Code type="ISBN">0-7897-2504-5<po:Code>
         <po:Quantity>1<po:Quantity>
         <po:Description>XML by Example<po:Description>
         <po:Price>29.99<po:Price>
      </po:Line>
      <po:Line>
         <po:Code type="ISBN">0-672-32054-1</po:Code>
         <po:Quantity>2<po:Quantity>
         <po:Description>Applied XML Solutions<po:Description>
         <po:Price>44.99</po:Price>
      </po:Line>
      <po:Line>
         <po:Code type="ISBN">2-10-005763-4<po:Code>
         <po:Quantity>2<po:Quantity>
         <po:Description>Huit Solutions Concrètes avec XML et Java</po:Description>
         <po:Price>40.00<po:Price>
      <po:Line>
      <po:Line>
         <po:Quantity>1<po:Quantity>
         <po:Description>Internet Magazine<po:Description>
         <po:Price>3.10<po:Price>
      <po:Line>
   </po:OrderLines>
<po:PurchaseOrder><

In Listing 1, for is the keyword; it loops over a sequence of lines and binds each item (each line) to the variable, $product. To select the sequence, the path uses location steps like XPath 1.0 (po:PurchaseOrder/po:OrderLines/po:Line).

Next is the return portion of the expression. The return creates a sequence dynamically. Essentially, it adds zero, one, or more items to the output sequence for every item in the loop.

Returning sequences is essential because sequences can be further processed through XPath. For example, it is trivial to compute the total of the purchase order by passing the returned sequence to the sum() function. sum() is an XPath 1.0 function that has been extended to work with sequences, as Listing 3 illustrates:

Listing 3. Processing the result of an XPath
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Line
   return $line/po:Price * $line/po:Quantity)

What it used to be like

Listing 4 is essentially the same algorithm as Listing 3, but implemented in XPath 1.0 and XSLT 1.0:

Listing 4. Computing the total with XPath 1.0
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:po="http://www.marchal.com/2006/po"
                xmlns:exslt="http://exslt.org/common"
                version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
   <xsl:variable name="lines">
      <xsl:for-each select="/po:PurchaseOrder/po:OrderLines/po:Line">
         <line-total><xsl:value-of select="po:Price * po:Quantity"/><line-total>
      <xsl:for-each>
   </xsl:variable>
   <xsl:value-of select="sum(exslt:node-set($lines)/line-total)"/>
<xsl:template>
</xsl:stylesheet>

Listing 4 starts by computing the individual line totals and passes the result to the sum() function. However, in XPath 1.0, variables must be declared in the host language (XSLT in this case) so a temporary result set is built into the variable lines. Next, the content of the variable is fed to a second XPath that computes the purchase order total.

When comparing Listing 3 and Listing 4, the increased expressiveness of XPath 2.0 is obvious. Listing 4 has two XPath statements instead of one, and it relies on the host language (XSLT) to communicate intermediate results. Listing 4 is less readable and, by breaking the request over two XPath statements, it limits opportunities for query optimization.

Conditional expression

XPath 2.0 also introduces a conditional expression (if), shown in Listing 5. The syntax is self-explanatory: depending on whether the expression in parenthesis evaluates to true or false, the expression returns the then or else section.

Listing 5. Conditional expression
if(/po:PurchaseOrder/po:Seller = 'Bookstore') then 'ok' else 'ko'

Quantified expressions

A discussion of sequences is not complete without quantified expressions. In a nutshell, quantified expressions are tests that apply to a sequence as a whole. The two quantified expressions are: every and some.

Listing 6 is an every expression. It consists of two sections: first binding a variable to a sequence (just like a loop) and then specifying a condition that items in the sequence must meet. The difference between a quantified expression and a loop is that the conditional expression returns a Boolean value, whereas the loop returns a sequence.

Specifically, an every expression returns true if the condition is true for every item in the sequence; a some expression returns true if the conditional expression is true for at least one item in the sequence.

Listing 6. Quantified expressions
every $line in /po:PurchaseOrder/po:OrderLines/po:Line satisfies $line/po:Code

Running Listing 6 against the document in Listing 2 returns false because the fourth line does not have a po:Code element. If you were to replace the every keyword with some, then the expression would return true because at least one line has a po:Code element.


Infinite combinations

The power of XPath 2.0 comes from the ability to combine expressions to create sophisticated requests. Listing 7 computes the purchase order total with a different formula: only those lines that include a product code are counted; the other lines are silently ignored (presumably it is not possible to ship those products). The coding is simple because it suffices to add an if expression that returns an empty sequence if the condition is not met.

Listing 7. Combining expressions
fn:sum(for $line in /po:PurchaseOrder/po:OrderLines/po:Line
   return if($line/po:Code) then $line/po:Price * $line/po:Quantity else ())

In conclusion, XPath 2.0, thanks to its new data model based on sequences, greatly simplifies writing complex requests. Requests that previously required a lot of XSLT code, you can now write exclusively in XPath.

Resources

Learn

  • What kind of language is XSLT? (developerWorks, February 2001 updated April 2005): Join author Michael Kay as he puts XSLT in context -- where the language comes from, what it's good at, and why you should use it.
  • Influences on the design of XQuery (developerWorks, September 2003): Read what XQuery pioneer Don Chamberlin has to say on the emergence of the XQuery language, the need for a query language for XML data, and the basic principles behind it.
  • An early look at XQuery (developerWorks, February 2002): Learn how to use the FLWR ("flower") clauses at the heart of XQuery with Kevin Williams.
  • Comparing XSLT 2.0 and XQuery (developerWorks, April 2006): With author Benoît Marchal, compare the two host languages for XPath 2.0.
  • An introduction to XQuery (developerWorks, June 2001 updated January 2006): Explore XQuery with Howard Katz, including background history, a road map into the documentation, and a snapshot of the current state of the specification.
  • developerWorks XML zone: Expand your XML skills with articles and tutorials.
  • IBM XML certification: Find out how you can become an IBM Certified Developer in XML and related technologies.

Get products and technologies

  • Saxon: Get combined XSLT and XQuery processing in Saxon.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=124239
ArticleTitle=Working XML: Get started with XPath 2.0
publish-date=05302006