Program with XML for DB2, Part 2: Leverage database support for XML in your application architecture

Learn how the new XML storage and query environment of IBM® DB2® 9 for Linux®, UNIX®, and Windows® plays into the XML data model described in Part 1 of this series. Part 2 focuses on how to exploit the improved database support for XML in your application architecture.

Hardeep Singh (hardeep@us.ibm.com), Architect Advanced Technologies, IBM, Software Group

Photo: Hardeep SinghHardeep Singh is a member of the advanced technologies group. He is the architect for DB2 XML tooling and XML migration. He has over 23 years of industry experience.



02 August 2007

Also available in Chinese Russian Vietnamese

Introduction

XML's status in databases has changed in the last couple of years from a temporary worker to that of a first class citizen. No longer does it need to morph its identity in order to fit into the relational world. It proudly maintains its hierarchical heritage, even while exploiting the power and stability of the relational database world. In fact, some of its relational neighbors have adapted techniques that make them look like XML in order to exploit the richness of the hierarchical XML model.

This article showcases how the new XML storage and query environment plays into the XML data model from Part 1. It shows how once you adapt to the new XML-based application development architecture, your database schemas become much simpler and more natural. It also demonstrates how querying the XML data in the database is no different from querying the data in the application. Finally, it shows you how to marry the relational data with the XML data to get best of both the worlds.


XML database basics

While most of the major relational databases have some support for XML, DB2's pureXML™ support is much more robust and efficient, making it an ideal database to test out the XML programming model. This article I focuses on how to exploit the improved database support for XML in your application architecture.

DB2 allows you to store, query, manipulate, and publish:

  • Relational data — SQL
  • Relational data as XML — SQL/XML
  • XML data — XQuery
  • Hybrid data (Relational & XML) — SQL/XML and XQuery
Figure 1. DB2 hybrid storage
DB2 hybrid storage

Store XML in the database

The main advantage of the XML support in the relational database is that you can save both relational and XML data in the same table. And although internally the XML is stored in a hierarchical (tree) format, logically in the database table it appears to be stored in a single column (like a CLOB or BLOB).

From the data objects in Part 1, you see that there are two tables with at least two columns each.

Listing 1. Tables
CREATE TABLE CUSTOMER_TABLE (
CUSTOMERID CHARACTER (12) NOT NULL,
CUSTXML XML NOT NULL ,
CONSTRAINT CC1183665042494 PRIMARY KEY ( CUSTOMERID) )

CREATE TABLE PURCHASE_TABLE (
CUSTOMERID CHARACTER (12) NOT NULL ,
ITEMXML XML NOT NULL ,
CONSTRAINT CC1183665244645 FOREIGN KEY
(CUSTOMERID) REFERENCES CUSTOMER_TABLE (CUSTOMERID)
ON DELETE CASCADE ON UPDATE NO ACTION
ENFORCED ENABLE QUERY OPTIMIZATION )

It is obvious from the above statements that by storing an application's data object as XML data, the relational schemas simplify to a greatly. Plus, the fact that the infrastructure is still relational allows the XML data to leverage the proven capabilities of the relational database like triggers, constraints, and foreign key relationships.

Since logically the XML column appears the same as a VARCHAR, CLOB, or BLOB column, the INSERT statements are also similar.

insert into CUSTOMER_TABLE values('hardeep',
'<Customer customerid="hardeep" firstname="hardeep" lastname="singh"/>')

Or if you were inserting from a Java™ program:

Listing 2. Inserting from a Java program
String insertsql= "insert into PURCHASE_TABLE values(?,?)";
PreparedStatement iStmt=connection.prepareStatement(insertsql);
File inputfile= new File(filename); //filename is the path of the XML file
long filesize=inputfile.length();
BufferedReader in = new BufferedReader(new FileReader(inputfile));
iStmt.setCharacterStream(1,in,(int)filesize);
int rc= iStmt.executeUpdate();

In order to better understand the hybrid storage, look at a view of how the XML data logically appears to be stored inside a relational database.

Note: Even if the physical storage technology for XML might differ for different relational database vendors, the logical view is similar.

Figure 2. DB2 Hybrid storage logical view
DB2 Hybrid storage logical view

Query the XML

When you expand the database schema model you can see the relational tables and columns. If you drill further into an XML column, the schema transitions from the relational model to the hierarchical one for XML. Now, if you get over the fact that there are two schemas, a relational schema and an XML schema, and just consider them to be one, then you can navigate and query into the unified schema in a more natural manner.

In the unified schema shown in Listing 1, if you wanted to get the data in the CUSTXML column of the CUSTOMER_TABLE, you would identify the path to the CUSTXML column as your target in your query.

SELECT CUSTXML FROM CUSTOMER_TABLE where customerid='hardeep';

This returns the customer data inside the CUSTXML column for hardeep.

Now consider the case where you want customer data where lastname of the customer is singh. In this case, you need to identify the path to the lastname attribute in each XML document (CUSTOMER_TABLE.CUSTXML/Customer/@lastname) and check to see if it is singh.

In a perfect world, the query would be Select * from CUSTOMER_TABLE where CUSTXML/Customer/@lastname='singh'. However, in the real world you need to formulate it in a syntax that is understood by the database query engine. A new language called XQuery, which can be used to query XML documents, has been introduced to the database world. SQL has been updated to add new functions that can understand this new language and bridge the two worlds. So a query that searches for customers with the last name singh would now look like:

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$cust/Customer[@lastname= "singh" ]' passing CUSTXML AS "cust" )

Or if you were making this call from a Java program using a parametrised query:

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$cust/Customer[@lastname= $lname ]'
passing CUSTXML AS "cust" , cast(? as VARCHAR(12)) as "lname")

Once you get over the funny syntax of passing parameters to the SQL/XML functions, you will find that for basic hybrid queries over relational and XML data, the XML queries contain mostly XPath statements. This is quite similar to what you were doing in the application layer (in Part 1) for the XML data model, where much of your code was making XPath calls to the document object model (DOM) wrapper to query and manipulate the XML data.

Note: In Viper 2, some simplification has been done regarding parameter passing to some of the SQL/XML function. For example, in the previous query the XMLExists passing clause does not specify the CUSTXML column.

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$CUSTXML/Customer[@lastname= $lname ]'
passing cast(? as VARCHAR(12)) as "lname")

Push application logic to the database

XQuery has all the rudimentary functionality of most high level languages (if-then-else, for, variables, functions, and arithmetic operators). This makes it possible to embed business logic inside the query. Plus, it has a lot of common functionality mapping to XSLT making it possible to not only query but also transform the XML output right in the database.

Take the Customer example for the XML data model from Part 1.

<Customer customerid ="" firstname="" lastname="" >
<Items><Item ID="" description="" purchaseDate="" price="" /></Items>
</Customer>

Replace application code with DB2 query

Instead of merging the XML data from the two tables in the application layer, you can do the same thing in the database using a single SQL/XML query. A one-to-many join of CUSTOMER_TABLE.CUSTXML/Customer/@customerid to PURCHASE_TABLE.ITEMXML/Customer/@customerid.

Figure 3. Join two XML columns
Join two XML columns
Listing 3. Query two XML columns
values(xmlquery('


for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

where $Customer/@customerid= $customerid

return
<Customer customerid ="{$Customer/@customerid}"
firstname ="{$Customer/@firstname}" lastname ="{$Customer/@lastname}" >{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Customer>

' passing cast( ? AS varchar(255) ) as "customerid" ))

The resulting XML for all items purchased by customer hardeep would be:

Figure 4. Query result
Query result

In the above query, you had to construct the outer Customer element and add the attributes from the CUSTXML column data. DB2 Viper 2 (beta) has support for XQuery updating expressions that enable modifications of the XML document so there is no need to construct the outer Customer element. Instead, you could use the one from the customer table and insert the items from the purchase table as children.

Listing 4. Viper 2 query for two XML columns
values(xmlquery('
				

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(<Items>{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Items>)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(
do insert $items as last into $cust)
return $cust

' passing hardeep as "customerid" ))

In the above queries, you not only searched, retrieved, and merged parts of XML documents stored in the database but you also transformed the resulting XML by adding new elements to it. Also, hardeep was implicitly cast to XML type (xs:string).

Comparison between the database query and Java application code

If you compare the above queries to the Java code (Listing 6. Rewriting the application to use the XML model) in Part 1, you find that the logic is quite similar.

  1. Select the Customer info from CUSTOMER_TABLE.
  2. Construct an Items element and search for all the items purchased by that customer from PURCHASE_TABLE.
  3. Iterate over each item in the selected list and insert it into the Items element.
  4. Insert the Items element into the Customer element.

Create a stored procedure

To separate the business logic in the database from the application code, it is a good idea to create a stored procedure for this query.

Listing 5. Create procedure
CREATE PROCEDURE customerItems(IN custid varchar(12))
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c_cur CURSOR WITH RETURN FOR

values(xmlquery('

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(<Items>{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Items>)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(
do insert $items as last into $cust)
return $cust

' passing custid as "customerid" ))
OPEN c_cur;
END

Replace the application code with stored procedure call

The application code now makes a stored procedure call to DB2 and passes the XML to the DOM wrapper. The application code for the XML model (Listing 6. Rewriting the application to use the XML model lines 2-8) in Part 1 would reduce to:

2. ResultSet dbResult = dbstmt.executeQuery("call customerItems ("+custid+")"
3. XMLParse customerXML = new XMLParse(dbResult. getString(1));

A more elaborate example

Consider a little more elaborate scenario that also calculates the insurance on each item. To make it a little more challenging, the insurance not only varies on a daily basis but also changes with price. This means that you have to pass to the query not only the customerid but also the insurance rates. Now assume that you query the latest insurance rates every day from a Web service provided by the insurance company. The insurance rate information comes as an XML document.

<insurance>
<rate price="100" currency="$" rate=".02"/>
<rate price="500" currency="$" rate=".018"/>
<rate price="" currency="$" rate=".015"/>
</insurance>

You can modify the previous stored procedure to calculate insurance rates.

Listing 6. Stored procedure that also calculates insurance for each item
CREATE PROCEDURE customerItemsWithInsurance(IN custid varchar(12), rate XML)
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c_cur CURSOR WITH RETURN FOR

values(xmlquery('

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(
<Items>{
for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer

let $insurance:=<insurance currency="{($rate//rate[@price=""]/@currency)}">
{(
if($Customer0/Item/@price > 500) then (
$Customer0/Item/@price * $rate//rate[@price=""]/@rate
)
else (

if($Customer0/Item/@price > 100) then (
$Customer0/Item/@price * $rate//rate[@price="500"]/@rate
)
else (
$Customer0/Item/@price * $rate//rate[@price="100"]/@rate
)

)
)}</insurance>

where $Customer0/@customerid= $Customer/@customerid

return
transform
copy $item:=$Customer0/Item
modify( do insert $insurance as last into $item)
return $item
}</Items>
)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(do insert $items as last into $cust)
return $cust

' passing custid as "customerid", rate as "rate" ));

OPEN c_cur;
END

The call to the stored procedure takes in two runtime parameters, the customerid and the insurance XML.

call customerItemsWithInsurance(?,?)

It is obvious from the above example that if the data being manipulated in the database is in XML format, then the power of XQuery can be leveraged to implement more of the business logic than was previously possible using SQL alone. Also, it is clear that the XML that is being used in the query does not even need to exist in the database. Thus, XML data participating in a SQL/XML query can either be stored in the database in its pure (hierarchical) form, it can be generated by using SQL/XML function, or it can even be passed as a runtime parameter to the query. The distinction between the database and an application server is gradually being blurred.


Pros and cons

Like every new technology, there will be teething problems. Some are due to fact that the implementations are in their first version and others because of inertia to change from the true and tried methodologies that you are comfortable with.

  1. Performance though improving is still not at par with the relational data.
  2. XQuery is a new language and some of the SQL/XML functions have a syntax that takes getting used to.
  3. There is a lot of legacy data already in relational format.
  4. Most critical is the fact that this is a new way of creating business applications and data schemas, different from the current way of object-oriented-based applications and normalized relation schemas.
  5. There are not many tools that can debug and optimize these kinds of queries for better performance.

Pitted against these odds are the facts that the new model is more natural in the way it manages the data. The business data information is maintained and manipulated intact in both the application and database layers, and as you will see in Part 3, even in the client layer.

  • Even though the surrounding languages might be different (Java, XQuery, JavaScript, PHP) the language used to traverse the XML document is the same (XPath) in all the layers.
  • Although legacy data is relational, it can easily be queried and morphed to XML using some of the new SQL/XML function introduced in Viper II . Looking at the example from Part 1, "Case II all data stored in the database as relational." The query can be simplified using the new XMLROW function introduced in Viper 2.
    Select XMLROW (customerid, firstname, lastname OPTION as attributes ROW Customer)
    from customer_table where customerid=?

    You can create joins between relational and XML data also. In this example scenario, if you had a third table containing product description of the purchased items and this was a relational table, then you could get the product description for each item purchased by doing a join using the item ID.

    Figure 5. Joining relational and XML columns
    Joining relational and XML columns
    Select details, weight from SQLPRODUCT, ITEM_TABLE
    where xmlexists ('$itemxml/item[@itemid=$pid]'
    passing ITEM_TABLE.ITEMXML AS "itemxml", SQLPRODUCT.PID AS "pid" )

    In DB2 9, you are able to pass runtime parameters to the XQuery embedded in the SQL statement using the passing clause but you could not do the same for SQL embedded inside an XQuery. In Viper 2, this limitation has been removed and now you are able to pass a runtime variable to a relational query embedded inside the XQuery.

    Listing 7. Pass the runtime variable to SQL embedded inside an XQuery
    values(xmlquery('
    
    for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
    where $Customer0/@customerid= $custid
    return (
    $Customer0/Item,
    db2-fn:sqlquery(
    ''select xmlrow(details, description, weight option ROW "description")
    from sqlproduct where pid= parameter(1)'', $Customer0/Item/@ID))
    
    ' passing cast( ? AS varchar(255) ) as "custid" ))

    Thus, even if some of the data is in relational tables and some is in XML you can now make dynamic joins between the XML and relational data from inside either the SQL query or the XQuery or both.

  • Even performance might not be a big issue in some cases since:
    • You are able to create XPath expression-based indexes on the XML documents stored in the database.
      create index custfname on customer_table(info) generate key
      using xmlpattern '/Customer/@firstname' as sql varchar(64)
    • The numbers of joins required are reduced since the databases schemas are simpler.
    • I/O may be reduced since now you can massage the data inside the query before you send it to the application.
    • You can always extract out key information from an XML document to relational columns using SQL/XML functions like XMLTable and create relational indexes on them.
    • You can create text search indexes on the XML document.

Conclusion

XML is here to stay. Most industries and government organizations are standardizing their XML schemas and are insisting on dealing with electronic documents that conform to these schemas. Since B2B data exchanged over the wire is now in XML, why not store that data as is (pureXML) in the database? Once you store the data as XML, you can index, query, validate, manipulate, transform, and update it using XQuery and standard SQL/XML. As you push more application logic into the query, your database becomes an active participant in the service-oriented architecture (SOA) world by publishing its stored procedures as Web services and feeds.

"The old order changeth, yielding place to new." Morte d'Arthur

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML, SOA and web services, Architecture
ArticleID=244162
ArticleTitle=Program with XML for DB2, Part 2: Leverage database support for XML in your application architecture
publish-date=08022007