Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

XML for Data: An early look at XQuery

A review of the XQuery working draft and how it applies to XML for data

Kevin Williams (kevin@blueoxide.com), CEO, Blue Oxide Technologies, LLC
Kevin Williams is the CEO of Blue Oxide Technologies, LLC, a company that designs software tools that help companies take advantage of the service-oriented Internet. Visit their Web site at http://www.blueoxide.com. He can be reached for comment at kevin@blueoxide.com.

Summary:  This column looks at the current state of the XML Query (a.k.a. XQuery) working draft. Kevin Williams shows how to use the FLWR ("flower") clauses, introduces the distinct-values function (which lends itself to pivoting data relationships), and offers his assessment of how XQuery will affect data document manipulation. Code samples in XQuery and XML demonstrate the use of for, let, where, and return (FLWR) clauses.

View more content in this series

Date:  01 Feb 2002
Level:  Introductory

Comments:  

If you've ever tried to use XSL to do any sort of complex data operation, such as joining two sets of elements, pivoting a relationship, or even relatively simple mathematical calculations, you know it falls a bit short in the feature department for data operations. To get around this problem, style sheet authors have had to use every trick in the book -- chaining multiple style sheets together, nesting xsl:for-each operators hip deep, and writing syntactically obtuse code that would give a Perl guru a headache. Fear not, for help is on the way. The XQuery specification, currently tracking towards a summer 2002 release, addresses all of these problems and more.

So what is XML Query?

XML Query, often abbreviated as XQuery, is a specification that's been around in one form or another for a few years now. The XML Query working group, chartered in September 1999, was tasked with creating a flexible query language to extract data from XML documents. The latest working drafts (see Resources) go a long way toward achieving this goal.

XQuery builds on the XPath specification. In fact, some of the features of XQuery have been acknowledged as being so fundamental that they have been incorporated into the XPath 2.0 specification, and this specification is now co-owned by the W3C's XML Query and XSL working groups. This is good news, as it means that style sheet authors will soon be able to take advantage of features like sequences, quantification, and stronger type control. Also, conditional expressions and iterators have been added to the XPath language, where previously they were part of the XSL language. This should allow for cleaner code in style sheets and fewer headaches for the style sheet creators.


FLWR expressions

The most powerful new feature in XQuery is the FLWR expression. FLWR (pronounced flower) is an acronym for For-Let-Where-Return, the clauses allowed in one of these expressions. FLWR expressions can perform many tasks you'd never dream of undertaking in XSL style sheets.

Each FLWR expression has one or more for clause, one or more let clause, an optional where clause, and a return clause.

for clauses

You use the for clauses to specify a set of Cartesian tuples on which the rest of the expression will be evaluated, shown in Listing 1. You control the order of evaluation by the order you choose for the tuples.


Listing 1. Single for clause
 for $exp1 in (<a/>, <b/>)

The running program will evaluate the expression in Listing 1 twice, with the $exp variable set to <a/> and <b/> . If you introduce another for expression, the program will evaluate the Cartesian product. Take a look at the example in Listing 2, which uses more than one for clause.


Listing 2. Multiple for clauses
for $exp1 in (<a/>, <b/>)
for $exp2 in (<c/>, <d/>)

In Listing 2, the program will evaluate the expression four times, once for each tuple:

  • (<a/>, <c/>)
  • (<a/>, <d/>)
  • (<b/>, <c/>)
  • (<b/>, <d/>)

let clauses

The let clause assigns a value or sequence to a variable. This can be useful shorthand for use in the where or return clauses.

where and return clauses

The where clause directs the program to discard particular tuples if they do not meet particular conditions. The return clause defines what to return for each tuple.

In this example, the query returns the names of all authors in the document who have written more than three books. It starts with an example document on which the expression operates, shown in Listing 3.


Listing 3. Example document authorList.xml
<authorList>
<author name="Kevin Williams">
<book>Professional XML, 2nd Edition</book>
<book>Professional XML Databases</book>
<book>Professional XML Schemas</book>
</author>
<author name="John Q. Somebody">
<book>Esoteric Topics in Programming, Vol. 1</book>
<book>Esoteric Topics in Programming, Vol. 2</book>
</author>
</authorList>


Listing 4. Sample query to select authors of more than three books
<frequentWriters>
{
let $inDoc := document("authors.xml")
for $author in ($inDoc//author)
let $cb := count($author/book)
where ($cb >= 3)
return
<author>$author/@name</author>
}
</frequentWriters>

The XQuery in Listing 4 would return the contents of Listing 5.


Listing 5. Results of the prolific author query
<frequentWriters>
<author>Kevin Williams</author>
</frequentWriters>


The distinct-values function

XQuery also introduces a function that comes in very handy when performing data manipulations: the distinct-values function (also found in XPath 2.0). This function allows you to easily pivot a relationship in a document. For example, say you had the list of your software company's customers and the products they had purchased shown in Listing 6.


Listing 6. Sample customer data, customerList.xml
<customerList>
<customer name="Big Bank, Inc.">
<product name="MyDataMinder" />
<product name="MyDataFinder" />
</customer>
<customer name="PharmaCorp, Inc.">
<product name="MyDataFinder" />
<product name="MyDataBinder" />
</customer>
</customerList>

If you wanted to transform this document into a document that lists all the products, along with a list of customers for each product, you would have a major task on your hands. It's possible, but very ugly to code. Using XQuery, though, the problem becomes a simple one, as shown in Listing 7.


Listing 7. Code to pivot the customer-product relationship
<productList>
{
let $inDoc := document("customerList.xml")
for $product in distinct-values("$input//customer/product/@name)
return
<product name={$product}>
{
for $customer in $input//customer
where $customer/product/@name = $product
return
<customer name={$customer/@name} />
}
</product>
}
</productList>

Listing 7 would produce the output shown in Listing 8.


Listing 8. Results of pivot code
<productList>
<product name="MyDataMinder">
<customer name="Big Bank, Inc." />
</product>
<product name="MyDataFinder">
<customer name="Big Bank, Inc." />
<customer name="Pharmacorp, Inc." />
</product>
<product name="MyDataBinder">
<customer name="Pharmacorp, Inc." />
</product>
</productList>

Powerful, simple to use, and easy to understand: XQuery makes that kind of data manipulation easy.


When should you use XQuery?

When it's sensible to begin using XQuery really depends on when you're reading this column and how eager you are to start using a new spec. As of February 2002 the specification is still in Working Draft status, which means it can change significantly between now and the time it is released. Once it reaches Proposed Recommendation status, it's generally viewed to be stable enough to try out -- in fact, the W3C encourages developers to use specs at this point to generate the feedback required to fine tune the spec before it's blessed with Recommendation status. Spring 2002 would be a good time to get to know the spec if you think it will offer big enough benefits that you want to try it out as soon as it reaches Proposed Recommendation status.

No matter when you decide that XQuery might be a viable solution for you, here are a few guidelines to keep in mind regarding when it may be an appropriate part of your solution.

First of all, XQuery isn't a magic bullet. Even though syntactically it's much better than XSL for data manipulation (and it allows some things that XSL doesn't allow directly), the engine underneath still has to read each document, parse it, and then manipulate it using the query language. This makes XQuery a good solution for indexed document repositories (so-called XML "databases") that can quickly access atoms of document content, but it's not as good a solution for unindexed documents.

Second, XQuery contains some mechanisms for accessing multiple documents in a repository. The document function allows you to programmatically access multiple documents in the same query. However, the same problem applies: You still need to load and parse every document. For best performance, then, you're still better off using an XML database or some other sort of indexing model.

Finally, XQuery works best on "hybrid" documents -- documents that contain both narrative flow and quantified data. For example, a medical-transcription document might contain both a narrative of the surgeon's actions during an operation, as well as specific amounts of medicine, blood, and other supplies used during the operation. That document would be ill suited to storage in a relational database, but XQuery would do a good job of extracting the quantified information from the XML document directly. If your document is pure data, however, it still makes more sense to bring it into a relational database for manipulation.


Conclusion

XQuery provides a strong grammar for the manipulation of data inside XML documents. It is best suited to documents that contain both narrative text and quantified data. For the best performance using XQuery on these types of documents, load them into some sort of indexed XML repository.

Whether the W3C will release the specification by summer still remains to be seen; there are some gigantic unresolved issues at this time, including whether there should be reserved words in XPath 2.0 expressions. These issues will almost certainly take some time to resolve. However, being aware of your document needs now can position you to best take advantage of this technology when it becomes widely available.


Resources

  • Learn more about the new functions and operators in XML Query and XPath in the XQuery 1.0 and XPath 2.0 Functions and Operators specification.

  • Read about the new combined data model for XQuery 1.0 and XPath 2.0 in the XQuery 1.0 and XPath 2.0 Data Model specification.

  • Finally, read the specification for XQuery 1.0.

  • Developer Howard Katz takes a look at the W3C's proposed standard for an XML query language in "An introduction to XQuery." (developerWorks, updated January 2002)

  • IBM's DB2 Extender page gives a basic overview of how DB2 works with XML, with links to a detailed white paper on querying with XML, viewable as a PDF file, and to DB2 Extender downloads.

  • Need some detail about working with XML and IBM's DB2 and WebSphere Application Server? The IBM Redbook Integrating XML with DB2 XML Extender and DB2 Text Extender shows how to use XML technology efficiently in business applications and explains how to integrate it with DB2 Universal Database, DB2 XML Extender and Text Extender, and WebSphere Application Server. This book will help developers to set up the environment and to create and process XML documents that can be stored and recovered using SQL.

  • Check out professional XML developer certification, part of IBM's professional developer program. Also see XML@Whiz, a third-party certification training site that has a $50 tutorial with mock certification test, plus a free demo (with fewer tests and questions).

  • Find other articles in Kevin William's XML for Data column.

About the author

Kevin Williams is the CEO of Blue Oxide Technologies, LLC, a company that designs software tools that help companies take advantage of the service-oriented Internet. Visit their Web site at http://www.blueoxide.com. He can be reached for comment at kevin@blueoxide.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12074
ArticleTitle=XML for Data: An early look at XQuery
publish-date=02012002
author1-email=kevin@blueoxide.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).