Programming with XML for DB2, Part 1: Understand the XML data model

A primary goal of XML is to make the application development process simple, cheap, portable, and of high quality. XML programming is bringing about the same kind of radical shift in the application development paradigm in this decade as object methodologies did in the last decade. In the first of a series on programming with XML for IBM® DB2® for Linux® UNIX®, and Windows®, you'll learn the basics of the XML data model and the advantages it brings to your programming environment over a pure object model.

Share:

Hardeep Singh (hardeep@us.ibm.com), Architect Advanced Technologies, IBM, Software Group

Hardeep Singh photoHardeep Singh is a member of the advanced technologies group. He is the architect for DB2 XML tooling and XML migration. He has over 23 years of industry experience.



19 July 2007

Also available in Chinese Russian Vietnamese

Introduction

Some of the design goals for XML as stated in the w3 recommendations refer to the application development aspect of the language:

  • "XML shall support a wide variety of applications."
  • "It shall be easy to write programs which process XML documents."

.

While a lot of focus has been given to the other goals around readability, serialization, and transportation, the application development goal has not created the same amount of buzz.

This article is the first of a series of articles showcasing the impact that XML has on application development at three levels:

  • Part 1 presents the case for using XML in application development for the purpose of making the application development process simple, cheap, portable, and of high quality. XML programming will bring about the same kind of radical shift in an application development paradigm in this decade as object methodologies did in the last decade.
  • Part 2 concentrates on the role of the database. Here the focus is both on both DB2 9 (origianlly codenamed Viper), and Viper 2 functionality. You'll learn:
    • How the new XML storage and query environment plays into the XML data model of the application tier
    • How, once you adapt to the new XML-based application development architecture, your database schemas become much simpler and more natural
    • How querying the XML data in the database is no different than querying the data in the application
    • Finally, how to marry the relational data with the XML data to get best of both the worlds.
  • Part 3 focuses on the client, bringing into picture the XML technologies used in the Web browsers and Web server: Ajax, XSLT, SVG, Comet, feeds, and mashups. You'll learn how feeds and Web services are generated in the database, queried and combined in the application layer, and then presented in the client browsers.
  • The fourth article brings together all these technologies and shows a real life working example.

XML data model basics

Note:

If your data is already in an XML format, the rest of the process is natural and extremely simple. If your data is in relational format, you can still use this methodology, but the process requires bidirectional mapping of relational to XML. This step can be achieved using SQL/XML functions for publishing and shredding. Most relational databases support these function besides other mapping technologies. Mapping relational to hierarchical (XML) might seem unnecessary to many developers just for the sake of using XML as a data model in their application development. But as developers, we do it all the time when we map relation to data objects. The advantages of using the XML data model might be compelling enough for many developers to consider using it in their application development, even if there is no need for XML data in their business model.

Traditionally, XML has been used to define metadata for business documents. The document object model (DOM) has been used to manipulate this metadata in an application. If we look at the DOM, we see that it provides an object interface to the hierarchical XML data structure with the DOM API being used to manipulate this hierarchy. In other words, the DOM can be used as an object wrapper to manipulate any data structure that can be represented using XML.

In the XML data model, you define many of your application data objects as XML. Since XML is hierarchical, it is easy to capture the relationships between the different data object in a natural, human readable format.

Once the XML data model is defined, it can be instantiated in the application using a DOM parser. To isolate the application code from the DOM APIs used to navigate and manipulate the XML model, you can create a wrapper around the DOM and XPath implementations. This wrapper also makes your code more portable.

I have attached an example Java wrapper with this article that you can use as is, or as a template for your own wrapper. The application's business logic directly manipulates the XML model using the wrapper APIs. The modified XML data can be easily serialized and passed between different object or in a multi-tiered environment (SOA) between different tiers.


XML data model versus the data object model

Most applications consist of business objects manipulating hierarchies of data objects. The data objects are generally thin wrappers around the business data. Their main reason for existing is to expose the encapsulated data to the business objects in a controlled fashion. Another benefit of having object wrappers is that they can present data stored in relational tables in a natural object hierarchy that captures the relationship between the data. A major part of the coding effort is dedicated to creating these object wrappers around the applications business data.

Since XML inherently maintains the relationship between data structures, the need to create a separate object hierarchy to capture the relationship between the individual data structures is irrelevant. Besides, XML already has a standard object model called the Document Object Model (DOM). Implementations of this model handle construction, modification, and serialization of the XML data. With a judicious use of the XPath language in conjunction with the DOM APIs, it is a trivial task to load, modify, and save XML data in a business application.


A tangible example

In order to better understand the differences in the two models, let's look at how each model impacts an application design and implementation.

Using sample code, I'lll illustrate the two approaches using a simple scenario that deals with customer and order information.

Data object model

In the data object model approach, you need to first create the wrapper objects to encapsulate the customer and order data, as shown in Listing 1, Listing 2, and Listing 3.

Listing 1. Create the customer
public class Customer
{
int customerid;
String firstname;
String lastname;
Items itemspurchased ;

Public Customer (int custid, Connection conn)
{
Statement dbstmt= conn.createStatement();
ResultSet dbResult = dbstmt.executeQuery("select fname, lname from
customer_table where customerid=custid");
customerid=custid;
SetFirstName(dbResult.getString(1));
SetLastName(dbResult.getString(2));
}
public String GetFirstName {return firstname;}
public Void SetFirstName (fname) {firstname=fname;}
public String GetLastName {return lastname ;}
public Void SetLastName (lname) {lastname=lname;}
public Items GetItemsList {return itemspurchased; }
public SetItemsList (list) { itemspurchased =list;}
}
Listing 2. Create the items class
public class Items
{
Hashtable list=new Hashtable();

Public Items(int custid,Connection conn)
{
Statement dbstmt= conn.createStatement();
ResultSet dbResult = dbstmt.executeQuery("select itemid, description,
price, date from purchase_table where customerid=custid")
While (dbResult.rs.next ())
{
tempitem = new Item();
tempitem.SetID(dbResult. getString(1));
tempitem. SetDescription (dbResult. getString(2));
tempitem. Setprice (dbResult. getFloat(3));
tempitem. SetpurchaseDate (dbResult. getString(4));
Additem (tempitem);
}
}

public void AddItem (item oneitem) {list.put(oneitem.GetID(),oneitem);}
public Item GetItem (ItemID) {return list.get(String itemID);}
public Hashtable GetItems(){return list;}
public Items FindItemByPrice (flaot min, float max)
{
Items retList=new Items();
for (Enumeration e=list.elements () ; e.hasMoreElements() ; )
{
item tmpItem=(item)e.nextElement();
float price= tmpItem .GetPrice();
if(price >= min && price <=max)
{
retList.AddItem(tmpItem);
}
}
}
public Items FindItemByDate (purchaseDate) { }
}
Listing 3. Create the item definition
public class Item
{
String id;
String description;
String purchaseDate;
Float price;

Public void SetID (String ItemID) {id= ItemID;}
Public void SetDescription (String desc) { description = desc;}
Public void SetpurchaseDate (String pDate) { purchaseDate = pDate ;}
Public void Setprice (float pprice { price = pprice ;}
Public String GetID (){return id;}
Public String GetDescription(){return description;}
Public float GetPrice(){return price;}
}

These data object can now be used in the application to manage the underlying data.

Listing 4. Manipulating the data objects in the application
Customer customer = new Customer (custid,dbConnection)
customer.SetItemList (new Items(custid , dbConnection)) ;
Items list=customer.GetItemsList(). FindItemByPrice(15.0,25.50);
for (Enumeration e=list.elements () ; e.hasMoreElements() ; )
{
System.out.println(((item)e.nextElement()).GetDescription());
}

In the above example we find that the code for the data objects is exponentially more than the code needed for the business logic. Also since the wrapper objects hide the relationships between the underlying business data, it is most essential that the wrapper object APIs be well documented for the application developer to understand how to use them properly.

Simple navigation between the object hierarchies is inherent in the data object mode,l but advanced search and navigation capabilities must be implemented for each search criteria (for example, FindItemByPrice).

XML data model

Since the main reason to have the wrapper objects was to encapsulate business data, they can be replaced with an XML data model.

Listing 5. XML data model
<Customer customerid ="" firstname="" lastname="" >
<Items>
<Item ID="" description="" purchaseDate="" price="" />
</Items>
</Customer>

Now if the data in the database is already stored in XML with:

  • Customer data stored as <Customer customerid ="" firstname="" lastname="" />
  • Item data stored as <Item ID="" description="" purchaseDate="" price="" />

Then for any given customer, all we need to do is to get the customer XML and insert in it the list of queried items.

Let's rewrite the application code to use the XML model to hold the customer and items information. To create and manipulate an instance of this XML data model we will use the DOM wrapper class called XMLParse that is attached in the Downloads section.

Case 1 -- Data stored in the database as XML

Listing 6. Rewriting the application to use the XML model
1. Statement dbstmt= conn.createStatement();
2. ResultSet dbResult = dbstmt.executeQuery("select custXML from
customer_table where customerid=custid");

3. XMLParse customerXML = new XMLParse(dbResult. getString(1));
4. customerXML.appendNode("/Customer", customerXML.createNode ("<Items/>"))

5. dbResult = dbstmt.executeQuery("select itemXML from purchase_table
where customerid=custid");
6. While (dbResult.rs.next ()) {
7. Node itemnode= customerXML.createNode (dbResult. getString(1));
8. customerXML.appendNode(itemnode ,"/Customer/Items",false);
}
9. customerXML.find("/Customer/Items/item[@price>15.0 and @price <25.5]",true);
10. for(int i=0;i < customerXML.currentFind.getLength();i++) {
11. System.out.println(customerXML.getValue("@description",i));
}

The first query (line 2) returns XML data in the custXML column for the given customer. This XML string is passed into the constructor of the DOM wrapper (line 3) which in turn uses the XML parser to instantiate an object hierarchy representing the XML data

Note : Since the customer XML does not have any items element in it (as expected by the XML schema we had defined in the model) we create a new element items (line 4) and append it as a child in the Customer element.

The second query (line 5) result retrieves a list of items (XML format) purchased by the customer from the database. Each item in the list (line 7) is appended (line 8) to the DOM objects hierarchy under the path Customer/items.

Finally, the DOM object hierarchy is searched using XPath for all items in the given price range (line 9), and the description of each searched item (line 10) is printed out.

Case 2 -- All data stored in the database as relational

Since the data is not stored as XML, we will need to do the transformation inside the query using SQL/XML publishing functions.

Listing 7. Transforming using SQL/XML publishing functions
1. Statement dbstmt= conn.createStatement();
2. ResultSet dbResult = dbstmt.executeQuery("select xmlelement( name \"Customer\" ,
xmlattributes(customerid as \"customerid\" ),
xmlattributes(fname as \"firstname\" ),
xmlattributes(lname as \"lastname\" )
) from customer_table where customerid=custid");

3. XMLParse customerXML = new XMLParse(dbResult. getString(1));

5. dbResult = dbstmt.executeQuery("select xmlelement( name \"items\" ,
xmlelement( name \"item\" ,
xmlattributes(itemid as \"id\" ),
xmlattributes(description as \"description\" ),
xmlattributes(price as \"price\" ),
xmlattributes(date as \"purchaseDate\" )
)
) from purchase_table where customerid=custid");
6. if (dbResult.rs.next ()) {
7. Node itemsnode= customerXML.createNode (dbResult. getString(1));
8. customerXML.appendNode(itemsnode ,"/Customer",false);
}

9. customerXML.find("/Customer/Items/item[@price>15.0 and @price <25.5]",true);
10. for(int i=0;i < customerXML.currentFind.getLength();i++) {
11. System.out.println(customerXML.getValue("@description",i));
}

So even though we did not have XML in the database, we used SQL to create an XML view to the relational data for our application. Also, we added the outer items element at the same time as we are generating the items XML in the query. Now all we need to do is add the items XML into the customer XML. The rest of the application remains the same.


Benefits of using an XML model over a pure object model

Data object wrappers constitute a major part of an application code, in effect shifting a lot of the focus from the business logic to managing the data objects. Besides, this extra code translates to:

  • Extra cost
  • More bugs
  • Longer application development cycle
  • Rigid and less portable code
  • Necessity to modify or regenerate the object hierarchy when there are any changes in the data schema
  • Code harder to maintain
  • No inbuilt validation of data
  • More documentation required to explain the wrapper objects
  • Fairly complex logic implementation required for capabilities like advanced search and navigation in the object hierarchy
  • Handling of data serialization by each business object
  • Application tied down to the tool if tools are being used to do the mapping

Using the XML programming methodology, the whole hierarchy of wrapper objects can be eliminated, leaving the programmer to focus on the business logic rather than on the business data structure. XML brings the following programming advantages:

  • Reduced code translates to better quality, lower costs and more flexibility.
  • It encourages RAD development.
  • Enhanced search and navigation capabilities are already built into the XPath parser.
  • Constraint checks and schema validation are built into the XML model.
  • Persistence is built into the model. At any time the XML data hierarchy can be flushed to a file, string, or stream.
  • No extra tools are required.
  • This methodology exposes the relationships and data hierarchy to the business logic. In the business object code, it is very difficult to understand the format of the business data structure that is being manipulated (that is, the data model is hidden from the business code.). In the relational world this is necessary; in the XML world it might be a drawback.
  • Business logic code is easy to read because the XPath describes the exact nature of the data and its relationship to the business structure.

Issues and solutions in adapting to the XML model

Relational data must to be mapped to XML if it is not stored as such. The mapping process is cumbersome, even though most relational database vendors provide tools for it. However, with the introduction of pure XML capabilities in databases servers such as DB2 and Microsoft® SQL Server, the need to map and shred XML data to relational tables for storage is no longer necessary. Using XQuery and XML indexes, this stored XML data can now be searched and retrieved intact into the application, in the same way that you would retrieve a large character object from the database.

For data stored as pure XML in a database, there is a need to understand the use of SQL/XML functions and xQuery to query the XML. By initially focusing on simple XPath searches rather than using complex XQueries, the pain of switching to XML queries can be reduced.

The learning curve for understanding and becoming proficient in using the DOM APIs and their implementations can be an issue, as well as becoming proficient in navigating and searching the XML hierarchy using XPath. Use the helper class attached to this article to reduce the need to directly call DOM APIs. The helper class encapsulates the DOM APIs and exposes more natural APIs that can be invoked from the application code. It has all the necessary functionality that is required to instantiate and serialize an XML model and to search and modify data or metadata in the XML instance. The wrapper class also handles XSL transformations, namespaces and schema validation if needed.

Directly embedding DOM API calls in the application's business logic is inefficient since any changes in the XML schema would require extensive changes to application code. Many of the API calls are used to navigate the hierarchy; this reduces the readability of the code. Data object being navigated or modified are not as obvious as they would be with a user-defined object wrapper. The wrapper class eliminates the need to embed DOM API calls in the business logic. Since the wrapper class uses XPath to navigate the DOM, any modifications to the schema affecting the application code would only require changes to the affected XPath string in API calls to the wrapper from the application. Also since XPath indicates the location of the XML node (that is being manipulated) in the hierarchy the readability of the application code is very high.


Conclusion

Business applications predominantly focus on creating, manipulating, storing and presenting business data. Data object wrapping is put around the business data so as to make it easier for the business logic to handle this. These data object wrappers are expensive to create and maintain and shift the focus of the application from the domain of business logic to data handling logic.

Using the XML data model, the whole hierarchy of data object wrappers can be eliminated, leaving the programmer to focus on the business logic rather than on the business data management. By using a DOM wrapper class, the application code is isolated from the DOM API. Using XPath for navigation makes the application code more understandable by illustrating the relationships in the business data being manipulated.

Ideally the data should be persisted as pure XML in the database, but even if the data is stored in relational tables, it can still make sense in many case to first transform it to XML for manipulating in the application.

If data structures wrapped inside an object hierarchy can be formatted using XML and if the main purpose of the object hierarchy is to manipulate and expose these data structures to the business logic, then a DOM can replace the wrapper object hierarchy.

In part 2 of this series, you'll learn to adapt the XML application architecture to your DB2 applications.


Download

DescriptionNameSize
DOM and XPath wrapper class for javaXMLParse.java30KB

Resources

Learn

Get products and technologies

  • Download a free trial version of DB2 Enterprise 9.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=241223
ArticleTitle=Programming with XML for DB2, Part 1: Understand the XML data model
publish-date=07192007