Skip to main content

Program with XML for DB2, Part 2: Leverage database support for XML in your application architecture

Hardeep Singh (hardeep@us.ibm.com), Architect Advanced Technologies, IBM, Software Group
Photo: Hardeep Singh
Hardeep Singh is a member of the advanced technologies group. He is the architect for DB2 XML tooling and XML migration. He has over 23 years of industry experience.

Summary:  Learn how the new XML storage and query environment of IBM® DB2® 9 for Linux®, UNIX®, and Windows® plays into the XML data model described in Part 1 of this series. Part 2 focuses on how to exploit the improved database support for XML in your application architecture.

View more content in this series

Date:  02 Aug 2007
Level:  Intermediate
Activity:  1274 views

Introduction

XML's status in databases has changed in the last couple of years from a temporary worker to that of a first class citizen. No longer does it need to morph its identity in order to fit into the relational world. It proudly maintains its hierarchical heritage, even while exploiting the power and stability of the relational database world. In fact, some of its relational neighbors have adapted techniques that make them look like XML in order to exploit the richness of the hierarchical XML model.

This article showcases how the new XML storage and query environment plays into the XML data model from Part 1. It shows how once you adapt to the new XML-based application development architecture, your database schemas become much simpler and more natural. It also demonstrates how querying the XML data in the database is no different from querying the data in the application. Finally, it shows you how to marry the relational data with the XML data to get best of both the worlds.


XML database basics

While most of the major relational databases have some support for XML, DB2's pureXML™ support is much more robust and efficient, making it an ideal database to test out the XML programming model. This article I focuses on how to exploit the improved database support for XML in your application architecture.

DB2 allows you to store, query, manipulate, and publish:

  • Relational data — SQL
  • Relational data as XML — SQL/XML
  • XML data — XQuery
  • Hybrid data (Relational & XML) — SQL/XML and XQuery

Figure 1. DB2 hybrid storage
DB2 hybrid storage

Store XML in the database

The main advantage of the XML support in the relational database is that you can save both relational and XML data in the same table. And although internally the XML is stored in a hierarchical (tree) format, logically in the database table it appears to be stored in a single column (like a CLOB or BLOB).

From the data objects in Part 1, you see that there are two tables with at least two columns each.


Listing 1. Tables
			
CREATE TABLE CUSTOMER_TABLE (
CUSTOMERID CHARACTER (12) NOT NULL,
CUSTXML XML NOT NULL ,
CONSTRAINT CC1183665042494 PRIMARY KEY ( CUSTOMERID) )

CREATE TABLE PURCHASE_TABLE (
CUSTOMERID CHARACTER (12) NOT NULL ,
ITEMXML XML NOT NULL ,
CONSTRAINT CC1183665244645 FOREIGN KEY
(CUSTOMERID) REFERENCES CUSTOMER_TABLE (CUSTOMERID)
ON DELETE CASCADE ON UPDATE NO ACTION
ENFORCED ENABLE QUERY OPTIMIZATION )

It is obvious from the above statements that by storing an application's data object as XML data, the relational schemas simplify to a greatly. Plus, the fact that the infrastructure is still relational allows the XML data to leverage the proven capabilities of the relational database like triggers, constraints, and foreign key relationships.

Since logically the XML column appears the same as a VARCHAR, CLOB, or BLOB column, the INSERT statements are also similar.

insert into CUSTOMER_TABLE values('hardeep',
'<Customer customerid="hardeep" firstname="hardeep" lastname="singh"/>')

Or if you were inserting from a Java™ program:


Listing 2. Inserting from a Java program
String insertsql= "insert into PURCHASE_TABLE values(?,?)";
PreparedStatement iStmt=connection.prepareStatement(insertsql);
File inputfile= new File(filename); //filename is the path of the XML file
long filesize=inputfile.length();
BufferedReader in = new BufferedReader(new FileReader(inputfile));
iStmt.setCharacterStream(1,in,(int)filesize);
int rc= iStmt.executeUpdate();

In order to better understand the hybrid storage, look at a view of how the XML data logically appears to be stored inside a relational database.

Note: Even if the physical storage technology for XML might differ for different relational database vendors, the logical view is similar.


Figure 2. DB2 Hybrid storage logical view
DB2 Hybrid storage logical view

Query the XML

When you expand the database schema model you can see the relational tables and columns. If you drill further into an XML column, the schema transitions from the relational model to the hierarchical one for XML. Now, if you get over the fact that there are two schemas, a relational schema and an XML schema, and just consider them to be one, then you can navigate and query into the unified schema in a more natural manner.

In the unified schema shown in Listing 1, if you wanted to get the data in the CUSTXML column of the CUSTOMER_TABLE, you would identify the path to the CUSTXML column as your target in your query.

SELECT CUSTXML FROM CUSTOMER_TABLE where customerid='hardeep';

This returns the customer data inside the CUSTXML column for hardeep.

Now consider the case where you want customer data where lastname of the customer is singh. In this case, you need to identify the path to the lastname attribute in each XML document (CUSTOMER_TABLE.CUSTXML/Customer/@lastname) and check to see if it is singh.

In a perfect world, the query would be Select * from CUSTOMER_TABLE where CUSTXML/Customer/@lastname='singh'. However, in the real world you need to formulate it in a syntax that is understood by the database query engine. A new language called XQuery, which can be used to query XML documents, has been introduced to the database world. SQL has been updated to add new functions that can understand this new language and bridge the two worlds. So a query that searches for customers with the last name singh would now look like:

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$cust/Customer[@lastname= "singh" ]' passing CUSTXML AS "cust" )

Or if you were making this call from a Java program using a parametrised query:

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$cust/Customer[@lastname= $lname ]'
passing CUSTXML AS "cust" , cast(? as VARCHAR(12)) as "lname")

Once you get over the funny syntax of passing parameters to the SQL/XML functions, you will find that for basic hybrid queries over relational and XML data, the XML queries contain mostly XPath statements. This is quite similar to what you were doing in the application layer (in Part 1) for the XML data model, where much of your code was making XPath calls to the document object model (DOM) wrapper to query and manipulate the XML data.

Note: In Viper 2, some simplification has been done regarding parameter passing to some of the SQL/XML function. For example, in the previous query the XMLExists passing clause does not specify the CUSTXML column.

select CUSTXML from CUSTOMER_TABLE
where xmlexists ('$CUSTXML/Customer[@lastname= $lname ]'
passing cast(? as VARCHAR(12)) as "lname")


Push application logic to the database

XQuery has all the rudimentary functionality of most high level languages (if-then-else, for, variables, functions, and arithmetic operators). This makes it possible to embed business logic inside the query. Plus, it has a lot of common functionality mapping to XSLT making it possible to not only query but also transform the XML output right in the database.

Take the Customer example for the XML data model from Part 1.

<Customer customerid ="" firstname="" lastname="" >
<Items><Item ID="" description="" purchaseDate="" price="" /></Items>
</Customer>

Replace application code with DB2 query

Instead of merging the XML data from the two tables in the application layer, you can do the same thing in the database using a single SQL/XML query. A one-to-many join of CUSTOMER_TABLE.CUSTXML/Customer/@customerid to PURCHASE_TABLE.ITEMXML/Customer/@customerid.


Figure 3. Join two XML columns
Join two XML columns

Listing 3. Query two XML columns
values(xmlquery('

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

where $Customer/@customerid= $customerid

return
<Customer customerid ="{$Customer/@customerid}"
firstname ="{$Customer/@firstname}" lastname ="{$Customer/@lastname}" >{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Customer>

' passing cast( ? AS varchar(255) ) as "customerid" ))

The resulting XML for all items purchased by customer hardeep would be:


Figure 4. Query result
Query result

In the above query, you had to construct the outer Customer element and add the attributes from the CUSTXML column data. DB2 Viper 2 (beta) has support for XQuery updating expressions that enable modifications of the XML document so there is no need to construct the outer Customer element. Instead, you could use the one from the customer table and insert the items from the purchase table as children.


Listing 4. Viper 2 query for two XML columns
values(xmlquery('
				
for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(<Items>{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Items>)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(
do insert $items as last into $cust)
return $cust

' passing hardeep as "customerid" ))

In the above queries, you not only searched, retrieved, and merged parts of XML documents stored in the database but you also transformed the resulting XML by adding new elements to it. Also, hardeep was implicitly cast to XML type (xs:string).

Comparison between the database query and Java application code

If you compare the above queries to the Java code (Listing 6. Rewriting the application to use the XML model) in Part 1, you find that the logic is quite similar.

  1. Select the Customer info from CUSTOMER_TABLE.
  2. Construct an Items element and search for all the items purchased by that customer from PURCHASE_TABLE.
  3. Iterate over each item in the selected list and insert it into the Items element.
  4. Insert the Items element into the Customer element.

Create a stored procedure

To separate the business logic in the database from the application code, it is a good idea to create a stored procedure for this query.


Listing 5. Create procedure
CREATE PROCEDURE customerItems(IN custid varchar(12))
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c_cur CURSOR WITH RETURN FOR

values(xmlquery('

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(<Items>{

for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
where $Customer0/@customerid= $Customer/@customerid
return $Customer0/Item

}</Items>)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(
do insert $items as last into $cust)
return $cust

' passing custid as "customerid" ))
OPEN c_cur;
END

Replace the application code with stored procedure call

The application code now makes a stored procedure call to DB2 and passes the XML to the DOM wrapper. The application code for the XML model (Listing 6. Rewriting the application to use the XML model lines 2-8) in Part 1 would reduce to:

2. ResultSet dbResult = dbstmt.executeQuery("call customerItems ("+custid+")"
3. XMLParse customerXML = new XMLParse(dbResult. getString(1));


A more elaborate example

Consider a little more elaborate scenario that also calculates the insurance on each item. To make it a little more challenging, the insurance not only varies on a daily basis but also changes with price. This means that you have to pass to the query not only the customerid but also the insurance rates. Now assume that you query the latest insurance rates every day from a Web service provided by the insurance company. The insurance rate information comes as an XML document.

<insurance>
<rate price="100" currency="$" rate=".02"/>
<rate price="500" currency="$" rate=".018"/>
<rate price="" currency="$" rate=".015"/>
</insurance>

You can modify the previous stored procedure to calculate insurance rates.


Listing 6. Stored procedure that also calculates insurance for each item
CREATE PROCEDURE customerItemsWithInsurance(IN custid varchar(12), rate XML)
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c_cur CURSOR WITH RETURN FOR

values(xmlquery('

for $Customer in db2-fn:xmlcolumn( "CUSTOMER_TABLE.CUSTXML")/Customer

let $items:=(
<Items>{
for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer

let $insurance:=<insurance currency="{($rate//rate[@price=""]/@currency)}">
{(
if($Customer0/Item/@price > 500) then (
$Customer0/Item/@price * $rate//rate[@price=""]/@rate
)
else (

if($Customer0/Item/@price > 100) then (
$Customer0/Item/@price * $rate//rate[@price="500"]/@rate
)
else (
$Customer0/Item/@price * $rate//rate[@price="100"]/@rate
)

)
)}</insurance>

where $Customer0/@customerid= $Customer/@customerid

return
transform
copy $item:=$Customer0/Item
modify( do insert $insurance as last into $item)
return $item
}</Items>
)

where $Customer/@customerid= $customerid

return
transform
copy $cust:=$Customer
modify(do insert $items as last into $cust)
return $cust

' passing custid as "customerid", rate as "rate" ));

OPEN c_cur;
END

The call to the stored procedure takes in two runtime parameters, the customerid and the insurance XML.

call customerItemsWithInsurance(?,?)

It is obvious from the above example that if the data being manipulated in the database is in XML format, then the power of XQuery can be leveraged to implement more of the business logic than was previously possible using SQL alone. Also, it is clear that the XML that is being used in the query does not even need to exist in the database. Thus, XML data participating in a SQL/XML query can either be stored in the database in its pure (hierarchical) form, it can be generated by using SQL/XML function, or it can even be passed as a runtime parameter to the query. The distinction between the database and an application server is gradually being blurred.


Pros and cons

Like every new technology, there will be teething problems. Some are due to fact that the implementations are in their first version and others because of inertia to change from the true and tried methodologies that you are comfortable with.

  1. Performance though improving is still not at par with the relational data.
  2. XQuery is a new language and some of the SQL/XML functions have a syntax that takes getting used to.
  3. There is a lot of legacy data already in relational format.
  4. Most critical is the fact that this is a new way of creating business applications and data schemas, different from the current way of object-oriented-based applications and normalized relation schemas.
  5. There are not many tools that can debug and optimize these kinds of queries for better performance.

Pitted against these odds are the facts that the new model is more natural in the way it manages the data. The business data information is maintained and manipulated intact in both the application and database layers, and as you will see in Part 3, even in the client layer.

  • Even though the surrounding languages might be different (Java, XQuery, JavaScript, PHP) the language used to traverse the XML document is the same (XPath) in all the layers.
  • Although legacy data is relational, it can easily be queried and morphed to XML using some of the new SQL/XML function introduced in Viper II . Looking at the example from Part 1, "Case II all data stored in the database as relational." The query can be simplified using the new XMLROW function introduced in Viper 2.
    Select XMLROW (customerid, firstname, lastname OPTION as attributes ROW Customer)
    from customer_table where customerid=?
    

    You can create joins between relational and XML data also. In this example scenario, if you had a third table containing product description of the purchased items and this was a relational table, then you could get the product description for each item purchased by doing a join using the item ID.



    Figure 5. Joining relational and XML columns
    Joining relational and XML columns

    Select details, weight from SQLPRODUCT, ITEM_TABLE
    where xmlexists ('$itemxml/item[@itemid=$pid]'
    passing ITEM_TABLE.ITEMXML AS "itemxml", SQLPRODUCT.PID AS "pid" )
    

    In DB2 9, you are able to pass runtime parameters to the XQuery embedded in the SQL statement using the passing clause but you could not do the same for SQL embedded inside an XQuery. In Viper 2, this limitation has been removed and now you are able to pass a runtime variable to a relational query embedded inside the XQuery.



    Listing 7. Pass the runtime variable to SQL embedded inside an XQuery
    			
    values(xmlquery('
    
    for $Customer0 in db2-fn:xmlcolumn("PURCHASE_TABLE.ITEMXML")/Customer
    where $Customer0/@customerid= $custid
    return (
    $Customer0/Item,
    db2-fn:sqlquery(
    ''select xmlrow(details, description, weight option ROW "description")
    from sqlproduct where pid= parameter(1)'', $Customer0/Item/@ID))
    
    ' passing cast( ? AS varchar(255) ) as "custid" ))
    

    Thus, even if some of the data is in relational tables and some is in XML you can now make dynamic joins between the XML and relational data from inside either the SQL query or the XQuery or both.

  • Even performance might not be a big issue in some cases since:
    • You are able to create XPath expression-based indexes on the XML documents stored in the database.
      create index custfname on customer_table(info) generate key
      using xmlpattern '/Customer/@firstname' as sql varchar(64)
          

    • The numbers of joins required are reduced since the databases schemas are simpler.
    • I/O may be reduced since now you can massage the data inside the query before you send it to the application.
    • You can always extract out key information from an XML document to relational columns using SQL/XML functions like XMLTable and create relational indexes on them.
    • You can create text search indexes on the XML document.

Conclusion

XML is here to stay. Most industries and government organizations are standardizing their XML schemas and are insisting on dealing with electronic documents that conform to these schemas. Since B2B data exchanged over the wire is now in XML, why not store that data as is (pureXML) in the database? Once you store the data as XML, you can index, query, validate, manipulate, transform, and update it using XQuery and standard SQL/XML. As you push more application logic into the query, your database becomes an active participant in the service-oriented architecture (SOA) world by publishing its stored procedures as Web services and feeds.

"The old order changeth, yielding place to new." Morte d'Arthur


Resources

Learn

Get products and technologies

Discuss

About the author

Photo: Hardeep Singh

Hardeep Singh is a member of the advanced technologies group. He is the architect for DB2 XML tooling and XML migration. He has over 23 years of industry experience.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML, SOA and Web services, Architecture
ArticleID=244162
ArticleTitle=Program with XML for DB2, Part 2: Leverage database support for XML in your application architecture
publish-date=08022007
author1-email=hardeep@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers