SQL was invented so programmers can abstract out code algorithms meant to handle, manipulate, or transform the data, and not lose the implementation of the algorithm in the details of getting to the data or persisting it. In this goal, SQL has been tremendously successful. It allows the creation of very complex OLTP applications, reporting systems, data warehouses, and business analytics. SQL and the relational model of database design go hand in hand, and the power and flexibility of SQL have made the relational model the dominant data model type for the past many decades.
Lost in this story of success is the story of hierarchical databases. These databases had the data structured into forests of data—where record types could have parent-child relationships with each other. The programming languages used with these databases weren't as successful as SQL in allowing programmers to abstract out the mechanics of data access and persistence, and this limitation led to their decline.
XML documents represent relationships between entities using a hierarchy. XQuery is used to work with XML data in one or more documents. And XQuery lets you abstract away the mechanics of data access and persistence just as well as SQL, while providing programming facilities that are better than those of SQL. Technology has come full circle in terms of how to work with data at a high level. XQuery has some interesting aspects that make it more attractive to the programmer. In this article, I explore these benefits.
Working with hierarchical data
First, look at some situations where XQuery has an obvious advantage over SQL. In an RDBMS, it is common to find parent-child relationships represented in two separate tables of a database. For example, a purchase order with multiple items being purchased might be represented in the database through a purchase_order table and an items table. (The purchase_order table includes both order numbers and customer numbers. The items table includes order numbers and item numbers.) The items table also has a foreign key relationship on the purchase_order table. SQL handles this sort of relationship very well. See Figure 1 and Listing 1.
Figure 1. Parent-child table design for purchase order
Listing 1. SQL statement to find all items in a given purchase order
select item_no, item_desc from purchase_order po, items i where ord_no = 'A12345' and po.ord_no = i.ord_no |
SQL shows its weakness when the data has a chain of relationships that are of unpredictable length. For example, a manager can have many employees, who might, in turn, each have many employees themselves, and so on. In this case, representing the data in separate parent-child tables is not practical. For one, you have to design the tables to the maximum length of the relationship—if the management chain can be at most six levels tall, then you must design six tables. This practice is ugly. What is worse is that to get to all the employees at level n from a manager at level x, you must do a join of the tables from level x through level n. This approach will be expensive in terms of the resources used.
The typical solution adopted is to have a single table where the rows of the table have a parent-child relationship with one another as in Figure 2. Columns for employee ID, manager ID, and employee name indicate the relationships. In this example, Jack Brown manages 2 employees, John Silver and Ron McDonald. John Silver manages Jon Carino. Unfortunately, querying data like this in XML results in queries that are hard to write and maintain. See Listing 2 for the code to create a SQL statement used to find employees reporting directly or indirectly to a manager.
Figure 2. Single table design for employee data with interrelated rows
Listing 2. SQL statement to find employees reporting directly or indirectly to a manager
with reporting_to(emp_id, emp_name) as
(
select emp_id, emp_name
from employees
where emp_name = 'John Silver'
union all
select direct_reports.emp_id, direct_reports.emp_name
from employees direct_reports, reporting_to
where reporting_to.emp_id = direct_reports.mgr_id
)
select emp_id, emp_name
from reporting_to
|
As you can see, this query uses recursion and requires significant effort to understand. This problem is solved much better using XML data types and XQuery. In this case, you represent the entire organization in a single XML document in one row of the table. Listings 3 and 4 show the resulting implementation, and you get to judge which one is programmatically easier to construct and maintain.
Listing 3. XML representation of employee data
<?xml version="1.0"?>
<org>
<employee id="0001">
<name>Jack Brown</name>
<employee id="0002">
<name>John Silver</name>
<employee id="0004">
<name>John Silver</name>
</employee>
</employee>
<employee id="0003">
<name>Ron McDonald</name>
</employee>
</employee>
</org>
|
Listing 4. Finding the list of employees when using an XML data type
select emp.emp_id, emp.emp_name
from employees, xmltable(
'$ORG/org//employee[name = "John Silver"]/descendant-or-self::employee'
columns
emp_id char(4) path '@id',
emp_name varchar(254) path 'name/text()'
) emp
|
Also note that hierarchical data is more common than you might think. For example, in the previous example, purchase orders can't contain other purchase orders, so on the surface, it seems that this example would never be hierarchical. A purchase order can fail; that is, out of a list of 20 items purchased, 2 items might not be available or might not be available in sufficient quantity. Thus, the existing purchase order closes with the available items being supplied and a new purchase order is generated for the unavailable items. In such situations, it is desirable for the new purchase order to contain a link to the old one. Then you have a chain of purchase orders whose relationship links look very much like the employee data just discussed.
Working with non-hierarchical data
Non-hierarchical data represents the larger chunk of data that is modeled in the relational world. It is important that any programming language used to access the database or to write to the database be able to handle these data entities very well. In the examples that follow, I describe how XQuery handles such data very well and, from a programmer's standpoint, provides a superior solution to SQL.
Case study 1: Finding the closest match and using it
This example deals with the case in which programming tasks are organized into competitions ("events"). For each such event, the creator or manager of the event assigns a certain amount of "points" to it. A look-up table of price points is provided, with each price point given an integer label called a "level." The problem is to find the price point that has the closest match to the "points" value of the event and to use that value as the "price" of the level (how much the winning competitor of the contest gets paid). If two levels become the closest match, then you use the higher level to determine the price point that applies.
Listings 5 and 6 show the standard database design and a sampling of the data.
Listing 5. Database table design using non-XML data types
[db2pe@lc4eb4168274532 code]$ db2 describe table pricing
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
EVENT_ID SYSIBM INTEGER 4 0 No
POINTS SYSIBM INTEGER 4 0 Yes
PRICE SYSIBM INTEGER 4 0 Yes
3 record(s) selected.
[db2pe@lc4eb4168274532 code]$ db2 describe table pricing_tier
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
LEVEL SYSIBM INTEGER 4 0 No
PAYMENT SYSIBM INTEGER 4 0 Yes
2 record(s) selected.
|
Listing 6. Sample data from the tables
[db2pe@lc4eb4168274532 code]$ db2 "select * from pricing fetch first 5 rows only"
EVENT_ID POINTS PRICE
---------- ---------- -----------
10472 640 0
10471 220 0
10470 190 0
10469 180 0
10466 780 0
5 record(s) selected.
[db2pe@lc4eb4168274532 code]$ db2 "select * from pricing_tier"
LEVEL PAYMENT
----------- -----------
0 60
1 120
2 240
3 360
4 480
5 600
6 720
7 840
8 record(s) selected.
|
Listing 7 shows the required SQL update statement.
Listing 7. Update statement to update the pricing table using standard SQL
update pricing o
set (price) = (
select payment
from (
select pricing_info.event_id event_id , max(level) matched_level
-- "min_values" will have the smallest difference between the current
-- price and all the pre-defined price points for each event. "pricing_info"
-- will contain an index of all the price point differences tabulated
-- by the price point level. A join of these two tables by event_id
-- and price point difference, should get you the price point level that
-- you are seeking for each event.
from (
select event_id, min(absdiff)
from (
-- For each event_id, calculate the absolute difference
-- between the existing price and the price points
select event_id, abs(points - payment) absdiff, level
from pricing, pricing_tier
)
group by event_id
) as min_values(event_id, closest_match), (
-- For each event_id, calculate the absolute difference
-- between the existing price and the price points
select event_id, abs(points - payment) absdiff, level
from pricing, pricing_tier
) as pricing_info(event_id, absdiff, level)
where min_values.event_id = pricing_info.event_id
and closest_match = absdiff
group by pricing_info.event_id
)x , pricing_tier y
where x.matched_level = y.level
and x.event_id = o.event_id
)
|
Now look at an implementation using XML data types. The pricing_tier table and event details can be represented with the XML documents in Listings 8 and 9.
Listing 8. Pricing information as an XML document
<pricing>
<pricingtier level="0"><price>60</price></pricingtier>
<pricingtier level="1"><price>120</price></pricingtier>
<pricingtier level="2"><price>240</price></pricingtier>
<pricingtier level="3"><price>360</price></pricingtier>
<pricingtier level="4"><price>480</price></pricingtier>
<pricingtier level="5"><price>600</price></pricingtier>
<pricingtier level="6"><price>720</price></pricingtier>
<pricingtier level="7"><price>840</price></pricingtier>
</pricing>
|
Listing 9. Event details as an XML document
<event id="9083" eventstate="Cancelled: Client request">
<title>UCD research 5</title>
<points>170</points>
</event>
|
Listing 10 shows the table design used.
Listing 10. Database table design when using XML data types
[db2pe@lc4eb4168274532 code]$ db2 describe table events
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
EVENT_ID SYSIBM INTEGER 4 0 No
EVENT SYSIBM XML 0 0 No
2 record(s) selected.
[db2pe@lc4eb4168274532 code]$ db2 describe table pricing
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
PRICING SYSIBM XML 0 0 Yes
1 record(s) selected.
|
This design assumes that the pricing information is inserted into the document as a peer element of the "points" element. Listing 11 shows the update statement that does this.
Listing 11. Update statement using XML data types and XQuery
update events o
set (event) = (
select xmlquery('
(:
: First build a series of "pair" elements which map a pricing level number
: to a value that is the absolute value of the difference between the
: event points and the price point. Once you have this sequence, sort by
: the difference and pick the lowest value. This results in you picking the
: closest price match. Because you break ties for the closest value by picking
: the higher level, you add a second sort key (level) in descending order.
:)
let $closestMatch :=
(
for $pair in ( for $p in $PRICING/pricing/pricingtier
let $lp := xs:int($EVENT/event/points)
let $absdiff := abs($lp - xs:int($p/price))
return
<pair>
<level>{$p/@level}</level>
<diff>{$absdiff}</diff>
</pair>
)
order by xs:int($pair/diff/text()),
xs:int($pair/level/text()) descending
return $pair
)[1]
return
transform
copy $e := $EVENT
modify
do insert
<pricing>{
$PRICING/pricing/pricingtier[@level = $closestMatch/level/@level]/
price/text()
}</pricing> after $e/event/points
return $e
')
from events a, (select pricing from pricing fetch first row only) b
where a.event_id = o.event_id
)
|
The main thing to note here is that the programming style very much resembles procedural programming—where intermediate calculations are done and the results are assigned to variables, which are then reused elsewhere. There are even nested for loops, which come naturally to a procedural programmer. The SQL statement in Listing 7 is identical in function and methodology but is harder to follow because the programming style is far removed from the normal procedural style. The XQuery style is quite easy to construct and maintain, leading to improved programmer productivity.
Case study 2: Finding related events in temporal data
Temporal data is a representation of the state of an entity as one or more aspects of it that vary over time. There are many ways to capture this changing state. For the sake of simplicity, focus on change in a single variable in the example that follows. Look at a data operation that is quite common in the world of data mining: At a high level, you are trying to identify a set of entities whose changing state has certain characteristics of interest. This set might be, for example, a customer who purchases detergent soon after purchasing soup. In this case, you will look at the extension of an earlier example. You are looking for employees who had more than one promotion during the 2000 through 2009 time period. See Listing 12.
Listing 12. Table structure capturing employee promotion history
[db2pe@lc4eb4168274532 code]$ db2 describe table pay_history
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
EMP_ID SYSIBM CHARACTER 4 0 No
START_TIME SYSIBM TIMESTAMP 10 6 No
END_TIME SYSIBM TIMESTAMP 10 6 Yes
JOB_LEVEL SYSIBM INTEGER 4 0 No
4 record(s) selected.
|
Listing 13 shows the SQL statement that you need.
Listing 13. Select statement to identify employees with more than one promotion in specified period
select emp_id, count(emp_id) from pay_history where start_time > '2000-01-01-00.00.00.000000' and end_time < '2010-01-01-00.00.00.000000' group by emp_id having count(emp_id) > 1 |
Listing 14 shows the XML datatype-based table design.
Listing 14. Table structure capturing employee promotion history using XML data types
[db2pe@lc4eb4168274532 code]$ db2 describe table pay_history
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
EMP_ID SYSIBM CHARACTER 4 0 No
HISTORY SYSIBM XML 0 0 No
2 record(s) selected.
|
Listing 15 shows the XML data representing the job history.
Listing 15. Table structure capturing employee promotion history using XML data types
<employee id="0001">
<pay startDate="2001-11-23T00:00:00.000000Z" endDate="2002-10-07T00:00:00.000000Z"
level="4">70500</pay>
<pay startDate="2002-10-07T00:00:00.000000Z" endDate="2005-06-18T00:00:00.000000Z"
level="5">81500</pay>
<pay startDate="2005-06-18T00:00:00.000000Z" endDate="2007-06-01T00:00:00.000000Z"
level="6">96700</pay>
<pay startDate="2007-06-01T00:00:00.000000Z" level="7">120000</pay>
</employee>
|
Listing 16 shows the XQuery-based select statement that is the equivalent of the select statement in Listing 13.
Listing 16. Table structure capturing employee promotion history using XML data types
select emp_id
from pay_history
where xmlexists('
let $numPromotions := count(
$HISTORY/employee/pay[@startDate gt "2000-01-01T00:00:00.000000Z"
and @endDate lt "2010-01-01T00:00:00.000000Z"]
)
return
if($numPromotions gt 1)then
(
true()
)else()
')
|
There doesn't seem to be any improvement in the XQuery form. Let's consider an additional requirement. Employees can be demoted or promoted. Each job history record might represent a demotion as well as a promotion. If you focused only on promotions and excluded all demotions, the XQuery statement now changes to the code in Listing 17.
Listing 17. XQuery focusing only on promotions processing the job history
select emp_id
from pay_history
where xmlexists('
let $i := 0
(:
: You have to work around the fact that the DB2 pureXML implementation of XQuery
: does not support the preceding-sibling axis. To do this, iterate through the
: "pay" elements for a given employee, and tack on the position of the element.
: Later on, use this position to determine the previous "pay" element.
:)
let $pairs := (
for $jobChanges in $HISTORY/employee/pay[@startDate gt
"2000-01-01T00:00:00.000000Z" and @endDate lt "2010-01-01T00:00:00.000000Z"]
let $i := $i + 1
return
<pair><position>{$i}</position>{$jobChanges}</pair>
)
let $numPromotions := count(
for $p in $pairs
let $currentPos := xs:int($p/position)
return
(: If this is the first "pay" element, its not a demotion :)
if($currentPos eq 1)then
(
$p/pay
)else(
if($pairs[$currentPos - 1]/data(@level) lt $p/data(@level))then
(
$p/pay
)else()
)
)
return
if($numPromotions gt 1)then
(
true()
)else()
')
|
The equivalent SQL statement, using non-XML data types, now requires a self-join to implement. Even then it will be quite complex and hard to construct. A mitigation might be to add a "job history row number" column to the pay_history table and then allow the self-join to be done on this column. This approach is left as an exercise for the reader.
The ideas conveyed in this article are not meant to be an enumeration of what is possible with XQuery. Rather, the examples are meant to illustrate how the powerful facilities provided by the language enable you to develop applications faster and to make them easier to maintain. This is a good reason, in this author's opinion, to encourage the increasing adoption of XML data types in databases.
Learn
- Port CONNECT BY to DB2 (Serge Rielau, developerWorks, Oct. 2005): Read an excellent introduction to using recursive queries for working with hierarchical data.
- pureXML in DB2 9: Which way to query your XML data? (Matthias Nicola and Fatma Ozcan, developerWorks, Aug. 2007): Learn more in this very thorough tutorial on the various facilities that DB2 pureXML provides to work with XML data types.
- New to XML? Get the resources you need to learn XML.
- XML area on developerWorks: Find the resources you need to advance your skills in the XML arena, including DTDs, schemas, and XSLT. See the XML technical library for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks podcasts: Listen to interesting interviews and discussions for software developers.
- developerWorks on-demand demos: Watch demos ranging from product installation and setup for beginners to advanced functionality for experienced developers.
Get products and technologies
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- developerWorks profile: Create your profile today and set up a watchlist.
- XML zone discussion forums: Participate in any of several XML-related discussions.
- The developerWorks community: Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Kenneth Stephen is an application architect who has 20 years of experience designing and implementing applications on platforms ranging from the PC to the mainframe. He has lots of experience designing and implementing applications using XML technologies, including XSLT and XQuery. Kenneth currently works in IBM Software Services for WebSphere.



