Query DB2 XML data with SQL


Content series:

This content is part # of # in the series: Get off to a fast start with DB2 9 pureXML, Part 3

Stay tuned for additional content in this series.

This content is part of the series:Get off to a fast start with DB2 9 pureXML, Part 3

Stay tuned for additional content in this series.

Although DB2's hybrid architecture represents a significant departure from previous releases, exploiting its new XML capabilities doesn't have to be a painful process. If you're already familiar with SQL, you can immediately apply your skills to working with XML data stored natively in DB2. See how in this article.

The XML features in DB2 9 include new storage management, indexing, and query language support. In this article, learn how to query data in DB2 XML columns using SQL or SQL with XML extensions (SQL/XML). The next article in the series will discuss DB2's new support for XQuery, an emerging industry standard, and explore when it can be most useful.

You may be surprised to learn that DB2 also supports bilingual queries -- that is, queries that combine expressions from both SQL and XQuery. Which language (or combination of languages) you should use depends on your application needs, as well as your skills. Combining elements of two query languages into one query isn't as tough as you may think. And doing so can offer you powerful capabilities for searching and integrating traditional SQL and XML data.

Sample database

The queries in this article will access the sample tables created in "Getting off to a fast start with DB2 9, Part 2" (developerWorks, March 2006). As a quick review, the sample "items" and "clients" tables are defined as follows:

Listing 1. Table definitions
create table items (
id 		int primary key not null, 
brandname 	varchar(30), 
itemname 	varchar(30), 
sku 		int, 
srp 		decimal(7,2), 
comments 	xml

create table clients(
id 		int primary key not null, 
name 		varchar(50), 
status 		varchar(10), 
contactinfo 	xml

Sample XML data included in the "items.comments" column is shown in Listing 2, while sample XML data included in the "clients.contactinfo" column is shown in Listing 3. Subsequent query examples will reference specific elements in one or both of these XML documents.

Listing 2. Sample XML document stored in "comments" column of "items" table
		<Message>Heels on shoes wear out too quickly.</Message>
		<Message>Where can I find a supplier in San Jose?</Message>
Listing 3. Sample XML document stored in "contactinfo" column of the "clients" table
		<street>5401 Julio Ave.</street>
		<city>San Jose</city>

Query environment

All the queries in this article are designed to be issued interactively, which you can do through the DB2 command line processor or the DB2 Command Editor of the DB2 Control Center. The screen images and instructions in this article focus on the latter. (IBM Data Studio and IBM Optim Development Studio also ship with an Eclipse-based Developer Workbench that can help programmers graphically construct queries. However, this article does not discuss application development issues or the Development Studio.)

To use the DB2 Command Editor, launch the Control Center and select Tools > Command Editor. A window similar to Figure 1 will appear. Type your queries in the upper pane, click on the green arrow in the upper left corner to run them, and view your output in the lower pane or in the separate "Query results" tab.

Figure 1. The DB2 Command Editor, which can be launched from the DB2 Control Center
Upper pane shows simple SQL query.  Lower pane shows query succesful.
Upper pane shows simple SQL query. Lower pane shows query succesful.

SQL-only queries

Even if your knowledge of SQL is limited, you'll still be able to query XML data with little effort. For example, the following query selects the full contents of the "clients" table, including the XML information stored in the "contactinfo" column:

Listing 4. Simple SELECT statement
select * from clients

Of course, you can write more selective SQL queries that incorporate relational projection and restriction operations. The following query retrieves the IDs, names, and contact information for all customers with a "Gold" status. Note that "contactinfo" contains XML data, while the other two columns do not:

Listing 5. Simple SELECT statement with projection and restriction
select id, name, contactinfo 
from clients
where status = 'Gold'

And, as you might expect, you can create views based upon such queries, as seen here with "goldview":

Listing 6. Creating a view that contains an XML column
create view goldview as 
select id, name, contactinfo 
from clients
where status = 'Gold'

Unfortunately, there's a lot you can't do with just SQL. Plain SQL statements enable you to retrieve full XML documents (as you've just seen), but you can't specify XML-based query predicates and you can't retrieve partial XML documents or specific element values from an XML document. In other words, you can't project, restrict, join, aggregate, or order by fragments of XML documents using plain SQL. For example, you can't retrieve just the email addresses of your Gold customers or the names of clients who live in zip code "95116." To express these types of queries, you need to use SQL with XML extensions (SQL/XML), XQuery, or a combination of both.

The next section explores several fundamental features of SQL/XML. And in a subsequent article, learn how to write XQuery as well as how to combine XQuery with SQL.

SQL/XML queries

As the name implies, SQL/XML is designed to bridge between the SQL and XML worlds. It evolved as part of the SQL standard effort and now includes specifications for embedding XQuery or XPath expressions within SQL statements. XPath is a language for navigating XML documents to find elements or attributes. XQuery includes support for XPath.

It's important to note that XQuery (and XPath) expressions are case-sensitive. For example, XQuery that references the XML element "zip" will not apply to XML elements named "ZIP" or "Zip." Case sensitivity is sometimes difficult for SQL programmers to remember, as SQL query syntax permits them to use "zip," "ZIP," and "Zip" to refer to the same column name.

DB2 9 features more than 15 SQL/XML functions that enable you to search for specific data within XML documents, convert relational data into XML, convert XML data into relational data, and perform other useful tasks. This article does not cover the full breadth of SQL/XML. However, it reviews several common query challenges and how key SQL/XML functions can address these challenges.

"Restricting" results based on XML element values

SQL programmers often write queries that restrict the rows returned from the DBMS based on some condition. For example, the SQL query in Listing 5 restricts the rows retrieved from the "clients" table to include only those customers with a "Gold" status. In this case, the customer's status is captured in an SQL VARCHAR column. But what if you want to restrict your search based on some condition that applies to data in an XML column? The XMLExists function of SQL/XML provides one means to do this.

XMLExists enables you to navigate to an element within your XML document and test for a specific condition. When specified as part of the WHERE clause, XMLExists restricts the returned results to only those rows that contain an XML document with the specific XML element value (in other words, where the specified value evaluates to "true").

Let's look at a sample query problem raised earlier. Imagine that you need to locate the names of all clients who live in a specific zip code. As you may recall, the "clients" table stores customers addresses (including zip codes) in an XML column. (See Listing 3.) Using XMLExists, you can search the XML column for the target zip code and restrict the returned result set accordingly. The following SQL/XML query returns the names of clients who live in zip code 95116:

Listing 7. Restricting results based on an XML element value
select name from clients
where xmlexists('$c/Client/Address[zip="95116"]'
passing clients.contactinfo as "c")

The first line is an SQL clause specifying that you only want to retrieve information in the "name" column of the "clients" table. The WHERE clause invokes the XMLExists function, specifying an XPath expression that prompts DB2 to navigate to the "zip" element and check for a value of 95116. The "$c/Client/Address" clause indicates the path in the XML document hierarchy where DB2 can locate the "zip" element. Using data accessible from node "$c" (which we'll explain shortly), DB2 will navigate through the "Client" element to its "Address" sub-element to inspect zip code ("zip" values). The final line resolves the value of "$c": it's the "contactinfo" column of the "clients" table. Thus, DB2 inspects the XML data contained in the "contactinfo" column, navigates from the root "Client" element to "Address" and then to "zip," and determines if the customer lives in the target zip code. If so, the XMLExists function evaluates to "true," and DB2 returns the name of the client associated with that row.

A common mistake involves formulating the XMLExists query predicate, as shown in Listing 8.

Listing 8. Incorrect syntax for restricting results based on an XML element value
select name from clients
where xmlexists('$c/Client/Address/zip="95116" '
passing clients.contactinfo as "c")

While this query will execute successfully, it will not restrict the results to clients living in zip code 95116. (This is due to the semantics specified in the standard; it's not unique to DB2.) To restrict results to clients living in zip code 95116, you need to use the syntax shown earlier in Listing 7.

You may be curious how to include a query that restricts XML data in an application. While this article does not discuss application development topics in detail, it includes a simple Java example that uses a parameter marker within an SQL/XML statement to restrict output to information about customers who live in a given zip code.

"Projecting" XML element values

Now let's consider a slightly different situation, in which you want to project XML values into your returned result set. In other words, we want to retrieve one or more element values from our XML documents. There are multiple ways to do this. Let's first use the XMLQuery function to retrieve a value for one element, and then use the XMLTable function to retrieve values for multiple elements and map these into columns of an SQL result set.

Let's consider how to solve a problem posed earlier: how to create a report listing the email addresses of the Gold customers. The following query in Listing 9 invokes the XMLQuery function to accomplish this task:

Listing 9. Retrieving email information for qualifying customers
select xmlquery('$c/Client/email' 
passing contactinfo as "c") 
from clients 
where status = 'Gold'

The first line specifies that you want to return values for the "email" sub-element of the root "Client" element. The second and third lines indicate where DB2 can find this information -- in the "contactinfo" column of the "clients" table. The fourth line further qualifies your query to indicate that you're only interested in email addresses of Gold customers. This query will return a set of XML elements and values. For example, if you had 500 Gold customers, each with one email address, your output would be a one-column result set with 500 rows, as shown in Listing 10:

Listing 10. Sample output for previous query

. . . 

If you have multiple email addresses for individual Gold customers, you may want to instruct DB2 to return only the primary address (that is, the first email address found in the customer's "contactinfo" document). You can modify the XPath expression in the first line of your query to do so:

Listing 11. Retrieving the first email address for each qualifying customer
select xmlquery('$c/Client/email[1]' 
passing contactinfo as "c") 
from clients 
where status = 'Gold'

Finally, if you lack email addresses for some Gold customers, you may want to write a query to exclude nulls from the result set. To do so, modify the previous query by adding another predicate to the WHERE clause to test for missing email information. You're already familiar with the SQL/XML function that enables you to do that -- it's XMLExists. Listing 12 shows how you can rewrite the previous query to filter out any rows for Gold customers whose contact information (stored as XML) lacks an email address:

Listing 12. Retrieving the first email address for each qualifying customer for whom we have at least one email address
select xmlquery('$c/Client/email[1]' 
passing contactinfo as "c") 
from clients 
where status = 'Gold'
and xmlexists('$c/Client/email' passing contactinfo as "c")

Now let's consider a slightly different situation, in which you need to retrieve multiple XML element values. XMLTable generates tabular output from data stored in XML columns and is quite useful for providing programmers with a "relational" view of XML data. Like XMLExists and XMLQuery, the XMLTable function causes DB2 to navigate through the XML document hierarchy to locate the data of interest. However, XMLTable also includes clauses to map the target XML data into result set columns of SQL data types.

Consider the following query (Listing 13), which projects columns from both relational data and XML data stored in the "items" table. (See Listing 2 to review the "items" table.) The comment IDs, customer IDs, and messages are stored in XML documents in the "comments" column. The item names are stored in an SQL VARCHAR column.

Listing 13. Retrieving multiple XML elements and converting each to a traditional SQL data type
select t.Comment#, i.itemname, t.CustomerID, Message from items i, 
xmltable('$c/Comments/Comment' passing i.comments as "c"
columns Comment# integer path 'CommentID',
	CustomerID integer path 'CustomerID',
	Message varchar(100) path 'Message') as t

The first line specifies the columns to be included in your result set. Columns surrounded by quotation marks and prefixed with the "t" variable are based on XML element values, as the subsequent lines of the query indicate. The second line invokes the XMLTable function to specify the DB2 XML column containing the target data ("i.comments") and the path within the column's XML documents where the elements of interest are located (within the "Comment" sub-element of the root "Comments" element). The "columns" clause, spanning lines 3 to 5, identifies the specific XML elements that will be mapped to output columns in the SQL result set, specified on line 1. Part of this mapping involves specifying the data types to which the XML element values will be converted. In this example, all XML data is converted to traditional SQL data types.

Figure 2 shows sample results from running this query. As you can see, the output is a simple SQL result set. Note that the column names have been folded into upper case -- a normal occurrence with SQL.

Figure 2. Sample output from query using the XMLTable function
Query results screen showing four columns of data
Query results screen showing four columns of data

If desired, you can use XMLTable to create result sets that include XML columns as well. For example, the following statement produces a result set similar to the previous one, except that "Message" data is contained in an XML column rather than an SQL VARCHAR column.

Listing 14. Retrieving multiple XML elements and converting them to traditional SQL or XML data types
select t.Comment#, i.itemname, t.CustomerID, Message from items i, 
xmltable('$c/Comments/Comment' passing i.comments as "c"
columns Comment# integer path 'CommentID',
	CustomerID integer path 'CustomerID',
	Message XML by ref path 'Message') as t

Creating relational views of XML data

As you might imagine, SQL/XML functions can be used to define views. This is particularly useful if you'd like to present your SQL application programmers with a relational model of your native XML data.

Creating a relational view over data in an XML column isn't much more complicated than projecting XML element values. You simply write an SQL/XML SELECT statement that invokes the XMLTable function and use this as a basis for your view definition. The following example in Listing 15 creates a view based on information in XML and non-XML columns of the "items" table. (It's similar to the query in Listing 13.)

Listing 15. Creating a view, based on the output of XMLTable
create view commentview(itemID, itemname, commentID, message, mustrespond) as 
select, i.itemname, t.CommentID, t.Message, t.ResponseRequested from items i, 
xmltable('$c/Comments/Comment' passing i.comments as "c"
columns CommentID integer path 'CommentID',
	Message varchar(100) path 'Message',
	ResponseRequested varchar(100) path 'ResponseRequested') as t;

Although it's easy to create relational views over XML column data, you should consider their use carefully if you are not on V9.7. Prior to V9.7, DB2 didn't use XML column indexes when queries were issued against such views. Thus, if you indexed the ResponseRequested element and issued an SQL query that restricted the results of the "mustrespond" column to a certain value, DB2 would read all the XML documents and search for the appropriate "ResponseRequested" value. Unless you have a small amount of data, this would slow runtime performance. So, until you upgrade to V9.7, when DB2 will use XML indexes on SQL predicates, be careful here.

Joining XML and relational data

By now, you may be wondering about joining XML data with non-XML data (relational data based on traditional SQL types, for example). DB2 enables you to do this with a single SQL/XML statement. While there are different ways to formulate such joins, depending on your database schema and workload requirements, we'll cover one example here. And you may be surprised to learn that you already know enough about SQL/XML to get the job done.

Recall that the XML column in the "items" table contains a "CustomerID" element. This can serve as a join key for the integer-based "id" column in the "clients" table. So, if you want a report of the names and status of clients who've commented on one or more of your products, you'd have to join XML element values from one table with SQL integer values from another. And one way to accomplish this is to use the XMLExists function, as shown in Listing 16:

Listing 16. Joining XML and non-XML data
select, clients.status from items, clients
where xmlexists('$c/Comments/Comment[CustomerID=$p]'
passing items.comments as "c", as "p")

The first line identifies the SQL columns to be included in the query result set and the source tables referenced in the query. The second line includes your join clause. Here, XMLExists determines if the "CustomerID" value in one target source is equal to a value derived from another target source. The third line specifies these sources: the first is the "comments" XML column in the "items" table, and the second is the integer "id" column in the "clients" table. Thus, if customers have commented on any item and information about this customer is available in the "clients" table, the XMLExists expression will evaluate to "true" and the client's name and status information will be included in the report.

Using "FLWOR" expressions in SQL/XML

Although we've only discussed a few functions, SQL/XML provides many powerful capabilities for querying XML data and integrating that data with relational data. Indeed, you've already seen some examples of how to do that, but we'll discuss a few more here.

Both the XMLExists and XMLQuery functions enable you to incorporate XQuery into SQL. Our previous examples show how to use these functions with simple XPath expressions to navigate to a portion of an XML document of interest. Now let's consider a simple example in which you include XQuery in your SQL queries.

XQueries may contain some or all of the following clauses: "for," "let," "where," "order by", and "return." Collectively, they form FLWOR (pronounced flower) expressions. SQL programmers may find it convenient to incorporate XQueries into their SELECT lists to extract (or project) fragments of XML documents into their result sets. And while that's not the only way the XMLQuery function can be used, it's the scenario this article covers. (A later article in this series discusses XQuery in greater depth.)

Let's imagine that you want to retrieve the names and primary email addresses of your "Gold" customers. In some respects, this task is similar to one we undertook earlier (see Listing 11), when we explored how to project XML element values. Here, you pass XQuery (with "for" and "return" clauses) as input to the XMLQuery function:

Listing 17. Retrieving XML data using "for" and "return" clauses of XQuery
select name, xmlquery('for $e in $c/Client/email[1] return $e' 
passing contactinfo as "c") 
from clients
where status = 'Gold'

The first line specifies that customer names and output from the XMLQuery function will be included in the result set. The second line indicates that the first "email" sub-element of the "Client" element is to be returned. The third line identifies the source of our XML data -- the "contactinfo" column. Line 4 tells us that this column is in the "clients" table. Finally, the fifth line indicates that only "Gold" customers are of interest to us.

Because this example was so simple, you could write the same query here. Instead, you could write the same query in a more compact manner, much as you did previously:

Listing 18. Rewriting the previous query in a more compact manner
select name, xmlquery('$c/Client/email[1]' 
passing contactinfo as "c") 
from clients
where status = 'Gold'

However, the return clause of XQuery enables you to transform XML output as needed. For example, you can extract email element values and publish these as HTML. The following query will produce a result set in which the first email address of each Gold customer is returned as an HTML paragraph.

Listing 19. Retrieving and transforming XML into HTML
select xmlquery('for $e in $c/Client/email[1]/text() 
return <p>{$e}</p>' 
passing contactinfo as "c") 
from clients
where status = 'Gold'

The first line indicates that you're interested in the text representation of the first email address of qualifying customers. The second line specifies that this information is to be surrounded by HTML paragraph tags before return. In particular, the curly brackets ( { } ) instruct DB2 to evaluate the enclosed expression (in the case, "$e") rather than treat it as a literal string. If you omit the curly brackets, DB2 would return a result set containing "<p>$e</p>" for every qualifying customer record.

Publishing relational data as XML

Up until now, we've concentrated on ways to query, extract, or transform data contained within a DB2 XML column. And, as you've seen, these capabilities are all available through SQL/XML.

SQL/XML provides other handy features as well. Among these is the ability to convert or publish relational data as XML. This article only covers three SQL/XML functions in this regard: XMLElement, XMLAgg, and XMLForest.

XMLElement lets you convert data stored in traditional SQL columns into XML fragments. That is, you can construct XML elements (with or without XML attributes) from your base SQL data. The following example nests its use of the XMLElement function to create a series of item elements, each of which contain sub-elements for the ID, brand name, and stock keeping unit ("sku") values obtained from the "items" table:

Listing 20. Using XMLElement to publish relational data as XML
select xmlelement (name "item", 
	xmlelement (name "id", id), 
	xmlelement (name "brand", brandname), 
	xmlelement (name "sku", sku) ) from items 
where srp < 100

Running this query will produce a result similar to:

Listing 21. Sample output from previous query
. . . 

You can combine XMLElement with other SQL/XML publishing functions to construct and group XML values together, nesting them in hierarchies as desired. The example in Listing 22 uses XMLElement to create customerList elements whose contents are grouped by values in the "status" column. For each "customerList" record, the XMLAgg function returns a sequence of customer elements, each of which include sub-elements based on our "name" and "status" columns. Furthermore, you see that customer element values are ordered by customer name.

Listing 22. Aggregating and grouping data
select xmlelement(name "customerList",
xmlagg (xmlelement (name "customer", 
xmlforest (name as "fullName", status as "status") )
order by name ) )
from clients 
group by status

Let's assume our "clients" table contains three distinct "status" values: "Gold," "Silver," and "Standard." Running the previous query will cause DB2 to return three customerList elements, each of which may contain multiple customer sub-elements that further contain name and status information. Thus, the output will appear similar to:

Listing 23. sample output from previous query
    <fullName>Chris Bontempo</fullname>
    <fullName>Ella Kimpton</fullName> 
. . . 
    <fullName>Lisa Hansen</fullName>
. . .
    <fullName>Rita Gomez</fullName>
. . .

Update and delete operations

Although the focus of this article is on searching and retrieving data stored in XML columns using SQL, it's worth spending a few moments considering two other common tasks: updating and deleting data in XML columns.

DB2 9 enabled users to update and delete XML data using SQL and SQL/XML statements. Indeed, because the initial draft of the XQuery standard did not address these issues, DB2 users had to rely on SQL for these tasks. However, the W3C was working on an XQuery Update Facility, which was implemented in DB2 9.5. The addition of the XQuery Update Facility (initially called TRANSFORM) greatly simplified updating attributes and elements in an XML document, as well as established a standard for doing so. The XQuery Update Facility is now in Candidate Recommendation Status.

Updating XML data

While DB2 9 enabled you to update an XML column with an SQL UPDATE statement or through the use of a system-supplied stored procedure (DB2XMLFUNCTIONS.XMLUPDATE), with DB2 9.5, the new XQuery Update Facility can be used. This allows updating, inserting, deleting, and creating a new element or attribute within an existing XML document without having to recreate an entire document. The Update facility can also be used to modify multiple nodes in the same transaction.

For example, if you want to issue an UPDATE statement to change the e-mail address of a particular client's contact information, you simply have to supply the new e-mail address.

Consider the following statement:

Listing 24. Sample UPDATE statement
update clients
set contactinfo = xmlquery( '
          copy $new := $CONTACTINFO
          modify do replace value of $new/client/email with ""
          return  $new' ) 
where id = 3227;

The "copy $new", "modify do replace of $new," and "return $new" are required clauses of the XQuery Update facilty. You can learn more regarding the exact syntax and options in the Related topics section below. We have included both the site for the XQuery specification as well as a developerWorks article giving more details on the XQuery Update Facility.

Deleting XML data

Deleting rows that contain XML columns is a straightforward process. The SQL DELETE statement enables you to identify (or restrict) the rows you want to delete through a WHERE clause. This clause may include simple predicates to identify non-XML column values or SQL/XML functions to identify XML element values contained within XML columns.

For example, here's how you can delete all customer information for customer ID 3227:

Listing 25. Deleting data for a specific client
delete from clients 
where id = 3227

Do you remember how to restrict SQL SELECT statements to return only rows for customers living in zip code 95116? If so, you can easily apply that knowledge to deleting rows that track those customers. Here's how to do so using XMLExists:

Listing 26. Deleting data for clients within a specific zip code
delete from clients
where xmlexists('$c/Client/Address[zip="95116"]'
passing clients.contactinfo as "c");


Finally, it's worth noting that you can create specialized XML indexes to speed access to data stored in XML columns. Because this is an introductory article and the sample data is small, this article does not cover that topic here. However, in production environments, defining appropriate indexes can be critical to achieving optimal performance. The Related topics section of this article can help you learn more about new DB2 indexing technology.


This article covered a lot of ground, highlighting several key aspects of SQL/XML and how you can use it to query data in XML columns. There's certainly more you can do with SQL and SQL/XML functions than we've discussed here. This article includes a simple Java example that illustrates how you can use parameter markers with SQL/XML to query data in XML columns. We'll discuss application development issues in greater detail in a future article. However, the next article will explore some interesting aspects of XQuery, a new query language supported by DB2 9.


Thanks to George Lapis, Matthias Nicola, Sriram Padmanabhan, Gary Robinson, Hardeep Singh, and Bert Van der Linden for their help with this article.

Downloadable resources

Related topics


Sign in or register to add and subscribe to comments.

Zone=Information Management, XML
ArticleTitle=Get off to a fast start with DB2 9 pureXML, Part 3: Query DB2 XML data with SQL