Handle pureXML data in Java applications with pureQuery

Discover three different approaches to handle pureXML data in your pureQuery applications

IBM® pureQuery and IBM DB2® pureXML™ are revolutionary database technologies in their fields. pureQuery is a high-performance Java™ data access platform focused on simplifying the tasks of developing and managing applications that access data from a database. pureXML is the native XML data management technology introduced in DB2 9. It consists of a hierarchical storage technology, XML querying languages (XQuery and SQL/XML), XML indexing technology, and other XML-related features. This article brings them together, showing how you can develop pureQuery applications that handle pureXML data so you can get the best performance and manageability from your DB2 application.

Vitor Rodrigues (vrodrig@us.ibm.com), Software Engineer, IBM

Vitor RodriguesVitor Rodrigues is a software developer in the IBM Data Studio Developer team and works at the Silicon Valley Lab. He graduated from University of Minho, Portugal, in Computer Science and Systems Engineering. Vitor joined IBM in 2005 as an intern working on DB2 Everyplace and DB2 9 pureXML. Prior to joining Data Studio developer team, he was a member of the Technical Enablement team for DB2 pureXML and IBM Data Studio, working out of the IBM Toronto and Silicon Valley labs.



08 January 2009

Also available in Japanese

Introduction

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Querying XML from Java applications

Developing Java applications that persist information into a database is a typical scenario for Java application development — so typical that it has influenced the creation of new APIs, like JDBC (Java Database Connectivity) API for Java database access, design patterns like the DAO (Data Access Object) pattern, and a multitude of Java persistence technologies, like JDO (Java Data Objects), EJB (Enterprise Java Beans), JPA (Java Persistence API), Hibernate, and EclipseLink, to name a few.

IBM pureQuery provides Java data access technology that strikes a balance between ease-of-use features to make Java development easy while providing enough transparency to the underlying data access (using SQL) to enable those applications to perform optimally. For more about pureQuery, see Resources.

These technologies are focused almost uniquely on relational database systems through ORM (Object-Relational Mapping) support. As a result of this focus on relational systems, non-relational database systems, like ones supporting XML storage, lack this type of end-to-end support from Java persistence technologies.

An example of a system that supports XML storage is the pureXML technology in DB2 9, a native XML data management technology that provides hierarchical storage of XML documents (its natural form), XML query languages, and XML indexing capabilities, among other features, integrated with the relational capabilities DB2 has been known for.

This article shows you how pureQuery's data access API can be easily extended to support both querying of XML documents stored in an XML column in a DB2 pureXML database and mapping of XML result sets into collections of Java technology objects. Bringing these two new technologies together brings pureQuery's productivity and runtime performance benefits to the XML database management domain.

Let's consider the following three approaches for dealing with XML data from DB2 in Java applications using pureQuery as the data access API:

  • Give control to the SQL layer
    With this approach, use the SQL layer to transform between XML and relational format so that the data can be used by the existing facilities provided by pureQuery.
  • Give control to the data access API
    With this approach, use an XML mapping framework (this article uses the mapping libraries of J2SE V6) to map between XML documents and Java objects, integrated in pureQuery's API custom handlers.
  • Give control to the application layer
    Implement your own mapping framework integrated in the Java beans that will represent your data in the business logic.

With each approach, see both how to fetch pureXML data from the database into Java beans and how to persist Java beans into the database as pureXML data.


Give control to the SQL layer

In this approach, the XML data is transformed using the SQL layer. SQL/XML statements are used to convert XML data into relational data, and XQuery transform expressions are used to perform XML sub-document updates.

As Figure 1 illustrates, XML data is mapped to relational using the SQL layer, and only relational data flows between the database, data access API, and the Java beans.

Figure 1. Giving control to the SQL layer
Giving control to the SQL layer

Fetching data — Using XMLTABLE to expose the XML data as relational

For fetching data from pureXML documents into Java beans, this approach is very likely the one that incurs least development effort, as it uses several pureQuery tools in Data Studio Developer.

You may choose to take advantage of features available in DB2 and in pureQuery, and glue them together. DB2 provides an extensive array of functionality to manipulate XML documents and expose them as relational data, which can be achieved with simple SQL/XML statements. Since pureQuery provides some basic object-relational mapping, consuming the result set of the SQL/XML query is a straightforward task.

So, let's get some work done! Take a look at the three sample XML documents in Listing 1, each of which represents an employee:

Listing 1. Example XML documents
<employee id="901">
     <name>
          <first>John</first>
          <last>Doe</last>
     </name>
     <office>344</office>
     <salary currency="USD">55000</salary>
</employee>


<employee id="902">
     <name>
          <first>Peter</first>
          <last>Pan</last>
     </name>
     <office>216</office>
     <phone>905-416-5004</phone>
</employee>


<employee id="903">
     <name>
          <first>Mary</first>
          <last>Jones</last>
     </name>
     <office>415</office>
     <phone>905-403-6112</phone>
     <phone>647-504-4546</phone>
     <salary currency="USD">64000</salary>
</employee>

The goal is to implement an API that queries the database and returns a list of Employee objects, representing each of the employees in the XML documents stored in the employee table.

These three documents can be stored in DB2 using a column of type XML. Using the DDL shown in Listing 2, you can create a new table of name "employee", containing only two columns: id of type INTEGER and doc of type XML.

Listing 2. DDL statement to create EMPLOYEE table
create table employee (id int not null primary key, doc XML);

To insert the three sample documents into the employee table, issue the following SQL statements in Listing 3:

Listing 3. Inserting sample documents into table employee.
insert into employee values(901, '<employee id="901"><name><first>John</first>
     <last>Doe</last></name><office>344</office>
     <salary currency="USD">55000</salary></employee>');
insert into employee values(902, '<employee id="902"><name><first>Peter</first>
     <last>Pan</last></name><office>216</office>
     <phone>905-416-5004</phone></employee>');
insert into employee values(903, '<employee id="903"><name><first>Mary</first>
     <last>Jones</last></name><office>415</office>
     <phone>905-403-6112</phone><phone>647-504-4546</phone>
     <salary currency="USD">64000</salary></employee>');

After inserting the documents into the table, you can use the XMLTABLE function, shown in Listing 4, to expose the documents in relational format so that they can be consumed by pureQuery:

Listing 4. XMLTABLE function to extract values from XML document
SELECT id, X.* FROM employee,
  XMLTABLE ('$d/employee' passing doc as "d" 
   COLUMNS 
      firstname		VARCHAR(5) 	PATH 'name/first',
      lastname		VARCHAR(5) 	PATH 'name/last',
      office		INTEGER 	PATH 'office') AS X

You can either call the function directly, or create a view based on this statement and then query the view. If you need some extra filtering, you should go with the view, or take a look at the XMLEXISTS() clause from SQL/XML.

Executing the statement in Listing 4 returns the following result:

Listing 5. DB2 command line output for XMLTABLE query
ID          FIRSTNAME LASTNAME OFFICE     
----------- --------- -------- -----------
        901 John      Doe              344
        902 Peter     Pan              216
        903 Mary      Jones            415

3 record(s) selected.

Now that you have the data in relational format, it's time to work on the pureQuery code. If you are new to pureQuery, there are three ways that you can implement object-relational mapping: top-down, bottom-up, and meet-in-the-middle. You can read more about these approaches in the article "Increase productivity in Java database development with new IBM pureQuery tools, Part 1: Overview of pureQuery tools" (developerWorks, September 2007).

Since we already have the SQL/XML statement created, this article uses the bottom-up approach, in which the SQL is used to generate the Employee Java bean.

You first create the DAO (Data Access Object), EmployeeDAO, using IBM Data Studio Developer and add the data access methods to it. Let's start with implementing a method that retrieves all the employees using the statement from Listing 4. To do this, use the Java code in Listing 6:

Listing 6. getEmployees method from the Employee data access object
public static List getEmployees(Data data){
     return data.queryList("SELECT id, X.* FROM employee, " +
          " XMLTABLE ('$d/employee' passing doc as \"d\" " + 
          " COLUMNS " + 
          " firstname          VARCHAR(20)     PATH 'name/first', " +
          " lastname           VARCHAR(20)     PATH 'name/last', " +
          " office             INTEGER         PATH 'office') AS X");
}

Ok, that's it! Your work is done! Well, not really. You still have a couple of mouse clicks left to take advantage of the pureQuery tooling, but all the hand-coding work is done.

While this example is created using the pureQuery inline style, the same can be achieved using the pureQuery annotated-method style. (This article later converts this same example to the annotated-method style to take advantage of static SQL in pureQuery.) You can explore the pureQuery inline style and the pureQuery annotated-method styles in the following articles:

Using a bottom-up approach from the SQL statement in the Java class, you can instruct Data Studio Developer to generate a Java bean with a structure matching the result set of your SQL. To do that, just right-click on you statement text, and select pureQuery > Generate pureQuery code, as shown in Figure 2:

Figure 2. Generating pureQuery code for an SQL statement
Generating pureQuery code for an SQL statement

In the first screen of the wizard (seen in Figure 3), select the option Generate bean for result set, then select a package and a class name, and deselect the option Generate annotated-method interface for SQL statement:

Figure 3. Specifying the bean properties
Specifying the bean properties

Click on Next, and make sure you uncheck all the check boxes (they provide you with useful functionality like test classes and sample applications, but you just need the bean for this example), then select Finish.

A new Java bean has been created and added to the package pdq.purexml.approach1:

Figure 4. Java files generated by pureQuery
Java files generated by pureQuery

Note the package pureQuery.example in Figure 4, which was created by Data Studio Developer and provides some utility methods to give developers a fast start with pureQuery. This article uses a couple of those methods in the sample main application.

Data Studio Developer pureQuery tooling generated the Java bean for you, and now you can start using it. Refactor your DAO, and change the getEmployees() method to explicitly return a list of Employee objects, instead of the generic type List.

Note that there are only two changes you need to make. One change is to the method signature, and the other to the pureQuery API call. You are telling pureQuery that the result set of your statement should be converted to objects of the class Employee. You pass this information to the API by passing the Employee class as an extra parameter. You can see the end result of this change in Listing 7, with the changes marked in bold text:

Listing 7. getEmployees method, now returning a list of objects from type Employee
public static List<Employee> getEmployees(Data data){
     return data.queryList("SELECT id, X.* FROM employee, " +
          " XMLTABLE ('$d/employee' passing doc as \"d\" " + 
          " COLUMNS " + 
          " firstname          VARCHAR(20)          PATH 'name/first', " +
          " lastname           VARCHAR(20)          PATH 'name/last', " +
          " office             INTEGER              PATH 'office') AS X",
          Employee.class);
}

To conclude this sample, let's create a simple main application to test our DAO that transforms XML documents into Java objects.

In Figure 5 you can see the sample main application, together with the console output of executing the application:

Figure 5. Sample application and output
Sample application and output

This is it! With just a few basic steps you can get your XML data from the database tables into your business logic!

Persisting data — Using XQuery transform expressions to perform sub-document updates

Besides full support of XQuery and XPath languages for querying XML data, DB2 also provides support for updating XML documents using XQuery transform expressions. To learn how to use XQuery transform to update XML documents in DB2, see the article "Update XML in DB2 9.5" (developerWorks, October 2007).

Let's consider the example described in the previous section, using the XMLTABLE statement. In that example, you are only fetching four node values from the existing XML documents, which may contain a larger number of XML nodes in addition to the ones you're extracting. Because of this, a full document update is not recommended, as it would overwrite the existing document with a new document, possibly causing a loss of information. For scenarios like this, sub-document update is the best approach to update existing XML documents.

This example shows you how to use XQuery transform expressions to update three nodes in the existing XML documents.

A new updateEmployee method is added to the data access object class to update a single employee in the database:

Listing 8. Using XQuery transform expression to do a sub-document update
public static void updateEmployee(Data data, Employee emp){
String sql = "update employee set doc = xmlquery(" +
     "'copy $new := $DOC " +
     "modify (do replace value of $new/employee/name/first with $firstname," +
     "do replace value of $new/employee/name/last with $lastname," +
     "do replace value of $new/employee/office with $office)" +
     "return $new' " +
     "passing cast(:firstname as varchar(20)) as \"firstname\", " +
     "cast(:lastname as varchar(20)) as \"lastname\"," +
     "cast(:office as INTEGER) as \"office\"" +
     ") where id = :id";
data.update(sql, emp);
}

XQuery statements in DB2 support parameter markers that are passed to XQuery by using the passing as clause. In Listing 8, you are passing the values of firstname, lastname, and office from the Employee bean into the transform statement. Since you are using named parameters, you can simply pass the Java bean to the pureQuery API call and don't need to be concerned about setting the parameters one by one, as you would have to do in a typical JDBC API call. pureQuery will extract the parameters values from the bean and pass them to DB2.

Note that the XQuery transform expression used is completely independent of the XMLTABLE statement and that the sub-document update is done in loco in the document stored in the XML column of the table Employee. There is no need for any serialization or parsing of XML documents during update since this is done directly in the document itself.

You can now add some extra code to the sample application to change the last name of the first employee to 'Rodrigues' and save those changes back into the database, as seen in Figure 6:

Figure 6. Sample application, now also with updates
Sample application, now also with updates

By looking at the output in Figure 6, you can confirm that the update was indeed successful, and John's last name is now Rodrigues.

An additional advantage of using XQuery transform expressions to update the XML document in the database is that you can update only a single field at a time, reducing the I/O between the application and the database. You could add a new method to the DAO object, let's say updateEmployeeOffice(), that would only update the office information for an employee, instead of updating all the nodes previously fetched. If you glue this method together with some logic to determine which variables have been updated, you can reduce to the minimum the amount of information transferred between your application and database, ultimately improving the performance of your application and the database server (due to reduced workload on the server).

Independently of the approach used to fetch the XML data from the database into Java beans, XQuery expressions can always be used to do sub-document updates in the database. Even if you retrieved the complete document from the database, you can still use sub-document updates to update isolated node values in the source document.


Give control to the data access API

With this approach, you give the control to the data access API. In the current context, that would mean to give control to the pureQuery API to implement the mapping from XML to Java and vice-versa, as seen in Figure 7:

Figure 7. Giving control to the data access API
Giving control to the data access API

pureQuery does not provide an embedded mapping framework, but it provides plug-in mechanisms that make it very easy to integrate existing mapping frameworks into regular pureQuery API calls. These plug-in mechanisms are called RowHandler and ResultHandler, and they support custom handlers for data rows and result sets, respectively. Custom handlers can be passed to most pureQuery API methods, giving developers extra power in customizing data handling.

Fetching data — Using a mapping framework

Some Java frameworks provide XML-Java mapping. Examples of popular frameworks are Castor, JAXB, JiBX, and XMLBeans. (For more information on these frameworks, see Resources.) J2SE V6 also provides XML-Java mapping libraries, based on JAXB. This article uses these libraries, since they are part of the J2SE V6 installation and require no extra setup.

The mapping libraries in J2SE V6 let you define mappings that can be used to unmarshal XML documents into Java objects and marshal Java objects into XML documents. This mapping is implemented by adding annotations to your Java bean, declaring whether each variable in the Java class is an attribute or an element in the XML document.

Creating these Java beans can be cumbersome for complex XML structures, so J2SE V6 provides some tooling that, given an XML schema, generates all the Java beans with the corresponding annotations required for the mapping process.

In the downloadable project provided in this article, together with the XML files in the folder data, you can find the XML schema used to validate these documents. The schema file is emp.xsd.

The tooling that can generate the annotated Java beans is called "xjc" and can be found in the bin folder of your J2SE V6 installation. Some of the parameters accepted by xjc are:

  • -d source_folder – the folder where the .java files will be created
  • -p package_name – the package in which to create the new Java files
  • my_schema.xsd – the XSD schema file to generate the beans from

To generate the Java beans for this example, you can run this application from your workspace with the following parameters:

-d src –p pdq.purexml.approach2 data/emp.xsd

The console output from running the generator is shown in Figure 8:

Figure 8. Generating Java classes from an XSD file
Generating Java classes from an XSD file

As you can see in Figure 8, two new files were created in package pdq.purexml.approach2: Employee.java and ObjectFactory.java. The first one is the Java bean to represent your employee information, while the second one is a factory helper object to create the Employee objects from XML documents. Figure 9 shows the generated files in the package explorer:

Figure 9. Java files generated by tooling
Java files generated by tooling

When you look at the Employee.java file, you see some annotations. These are annotations by J2SE V6 to implement the Java-XML mapping:

Listing 9. Employee bean generated by tooling
...
@XmlAccessorType(AccessType.FIELD)
@XmlType(name = "", propOrder = {
    "name",
    "office",
    "phone",
    "salary"
})
@XmlRootElement(name = "employee")
public class Employee {

    protected Name name;
    protected String office;
    protected List<Phone> phone;
    protected Salary salary;
    @XmlAttribute(required = true)
    protected String id;
...

Note that each element or attribute is converted to a Java type. If the XML element type is xs:simpleType, then it will be converted to a native Java type. Otherwise, an inner class will be created to store that element's contents (see name, phone and salary in Listing 9).

You now have your Java bean created, and the next step is to implement the pureQuery code that will fetch the XML data from your pureXML database and return an Employee object for each XML document fetched from the database.

You can do this by creating a custom RowHandler that will be called JAXBHandler. This handler can then be passed to several pureQuery API calls, including queryList, used in previous examples.

Listing 10. JAXBHandler custom row handler
public class JAXBHandler implements RowHandler {

     public Employee handle(ResultSet rs, Object arg1) throws SQLException {
          Employee emp = new Employee();
          JAXBContext jContext;
          try {
               jContext = JAXBContext.newInstance("pdq.purexml.approach2");
               Unmarshaller unmarshaller = jContext.createUnmarshaller();
               emp = (Employee)unmarshaller.unmarshal(rs.getBinaryStream(2));
          } catch (JAXBException e) {
               e.printStackTrace();
          }
          emp.setId(rs.getString(1));
          return emp;
     }
}

The JAXBHandler, declared in Listing 10, is applied to each row in the result set of the SQL query. For each one of those rows, it will execute the method handle. In JAXBHandler, this method uses an Unmarshaller object to convert the XML document fetched from the database into a Java object of type Employee. As you may remember from the table declaration in Listing 2, the XML document is in the second column of the table, so you can call the ResultSet method getBinaryStream(2) to specify that you want to fetch the second column of your result set as a binary stream that you will pass to the Unmarshaller. The Unmarshaller receives that binary stream and parses it, creating the correspondent Employee object.

After creating the JAXBHandler row handler, you can simply use it in your pureQuery API calls, such as the example in Listing 11:

Listing 11. getEmployees method for the Employee data access object
public static List<Employee> getEmployees(Data data){
     return data.queryList("select * from employee",new JAXBHandler());
}

The example in Listing 11 is calling the pureQuery method queryList() with the following parameters:

  • An SQL select statement that will be executed to retrieve several rows from the database
  • A RowHandler that will be used to transform each one of the returned rows into a Java object

The advantage of using a row handler to map from XML to Java beans is that the pureQuery API call to fetch data from the database is very similar to that used when working with relational tables. You can simply do a "full select" SQL query on your table and have a list with all the employees that exist in that table. The fact that the source data was in XML is almost unnoticeable at the application level since it is seamlessly integrated in the data access API — pureQuery.

Running a very simplistic sample application, you can see the JAXBHandler in action, mapping data from a pureXML column into an Employee bean. The application code and the output are in Figure 10:

Figure 10. Sample application for mapping using JAXBHandler
Sample application for mapping using JAXBHandler

Persisting data — Using a mapping framework

Similar to making use of an existing Java<->XML mapping framework to map from XML documents to Java beans in the previous section, you can use the same framework to do the reverse mapping: from Java beans to XML documents.

In addition to unmarshalling methods provided to convert XML documents into Java beans, J2SE V6 also provides marshalling facilities that can be used to convert Java beans into XML documents.

These facilities can become very handy when doing full document update of your XML data. If most or all of the fields in your Java object have changed since you fetched the data from the database, it will be simpler to just do a full document replacement than to issue several separate update statements. In such a case, using a marshaller can greatly simplify the task.

Listing 12 shows the method updateEmployee that, given an employee, updates its record in the database:

Listing 12 . Persisting a Java object as XML by using J2SE V6 to do the marshalling
public static void updateEmployee(Data data, Employee emp){
     try {
          JAXBContext jContext = JAXBContext.newInstance("pdq.purexml.approach2");
          Marshaller marshaller = jContext.createMarshaller();
          StringWriter sw = new StringWriter();
          marshaller.marshal(emp, sw);	
          data.update("update employee set doc = ? where id = ?", 
                         sw.toString(), emp.getId());
     } catch (JAXBException e) {
          e.printStackTrace();
     }
}

The steps used to convert your Java object into textual XML representation are similar to the ones used to convert the original XML documents into Java objects:

  1. Get the JAXB context for your application.
  2. Create a marshaller.
  3. Marshall the Java object, originating an XML representation of that same object.

After you have the object in XML format, you can issue a regular SQL update statement to update the record in the table EMPLOYEE.

The code in this example can be easily tested with a simple sample application, as the one shown in Figure 11:

Figure 11. Sample application using JAXBHandler to demo both persisting and fetching data
Sample application using JAXBHandler to demo both persisting and fetching data

Using a mapping framework to convert between XML and Java certainly makes the development task easier, but you should be aware of the extra serialization and parsing going on when you fetch the data from DB2. It is parsed by J2SE V6 mapping libraries to create the Employee objects.


Give control to the application layer

With this approach, see how the mapping between pureXML data and Java beans can be done in the application layer.

Figure 12. Giving control to the application layer
Giving control to the application layer

The mapping is implemented by the Java beans, populating the class variables from the XML document and recreating XML documents from the class variables.

As Figure 12 shows, XML data flows from the pureXML database into the application layer, where it is handled by the Java beans.

Fetching data — Implementing the mapping in the Java bean

In this example, the mapping will be done in the bean itself. pureQuery API will still be used to provide the data access layer, but it will return the data rows "as is" to the application, in the form of an Employee object with two variables: id and doc.

You can implement your own mapping from XML to Java beans as added functionality to the Employee bean. Instead of creating a complex mapping framework, you can use the simple approach of adding this mapping as extra functionality in the Java bean. In a real world scenario, this mapping would ideally be done in a utility library so that it could be reused by multiple components of your application.

The mapping can be implemented by creating your own getter and setter methods. To represent the data rows as Java beans, pureQuery calls the bean's set methods to populate the bean variables when fetching data from the database. Usually, these setter methods are generated automatically by pureQuery. In this approach, you can create the methods yourself to implement the mapping from XML to other fields in the Java bean.

The example Employee bean for this approach contains the fields id and doc to store the columns of the database table, plus some additional fields that will be populated by mapping code. These additional fields, seen in Listing 13, are: firstname, lastname, and office.

Listing 13. Employee bean for in-the-bean mapping
public class Employee {

     // Class variables generated by pureQuery code generator
     @Id
     protected int id;
     protected String doc;

     // Additional class variables
     protected String firstname;
     protected String lastname;
     protected int office;

     ...

These extra fields are populated when the method setDoc() is called by pureQuery in order to populate the doc variable. As Listing 14 shows, when the setDoc method is called, the Employee bean not only assigns the parameter value to the doc variable, but it also populates the additional class variables, by calling the method populateVariables():

Listing 14. setDoc method for Employee bean
public void setDoc(String doc) {
     this.doc = doc;
     populateVariables();
}

The code in Listing 15 implements the mapping between the XML document and the class variables. This article uses the JAXP —Java API for XML Processing — libraries to parse the XML document and evaluate some XPath expressions. (See Resources for more information.) Unfortunately, there is no method yet to pass the XML binary document from DB2 to a Java Document object, so the XML documents need to be serialized by DB2 and parsed again by the Java libraries.

Keep this in mind when deciding which approach is the best to use for fetching data from the database.

Take a look at the code in Listing 15 to see how to create an instance of Document with the contents of the XML document retrieved from the database and how to apply XPath expressions to extract the node values that will be used to populate firstname, lastname, and office class variables.

You may have noticed that the XPath expressions look very similar to the ones used in the XMLTABLE view from the first example. This step is in fact similar, with the only difference being that it is done on the client side instead of the database engine, and thus using different APIs. However, you can see that the mapping logic is the same: you extract node values that are set as class variables.

Listing 15. Populating other class variables with the contents from doc
private void populateVariables(){
try {
     DocumentBuilder builder=DocumentBuilderFactory.newInstance().newDocumentBuilder();
     Document document = builder.parse(new InputSource(new StringReader(this.doc)));
     XPath xpath = XPathFactory.newInstance().newXPath();
     String firstname_expression = "/employee/name/first";
     String lastname_expression = "/employee/name/last";
     String office_expression = "/employee/office";
     String fname = (String)xpath.evaluate(
               firstname_expression, document, XPathConstants.STRING);
     String lname = (String)xpath.evaluate(
               lastname_expression, document, XPathConstants.STRING);
     int office = (new Integer((String)xpath.evaluate(
               office_expression, document, XPathConstants.STRING))).intValue();
     setFirstname(fname);
     setLastname(lname);
     setOffice(office);
} catch (Exception e) {
	//handle exception
}
}

In the EmployeeDAO for this approach, in order to retrieve all the employee records as Employee objects, you just do a full select and tell pureQuery to create Employee objects with the returned data, as shown in Listing 16:

Listing 16. getEmployees method returning Employee beans
public static List<Employee> getEmployees(Data data){
     return data.queryList("select * from employee", Employee.class);
}

When pureQuery creates and populates Employee objects, as directed per the API call in Listing 16, the Employee class method setDoc is executed to populate the doc variable and that event will trigger the population of the additional class variables.

The sample application to demo the execution of this code is similar to the previous samples, as seen in Figure 13:

Figure 13. Sample application using in-the-bean mapping
Sample application using in-the-bean mapping

Note the "Console" window output in Figure 13, and how the fields for name and office have been correctly populated by the in-the-bean mapping code.

Persisting data — Implementing the mapping in the Java bean

The same approach of in-the-bean mapping can be used when persisting Java objects into XML documents stored using DB2 pureXML. While in the previous example used created set methods, in this step you now need to create your own get methods that will produce an XML document with the contents of the Java bean. The generated XML document can then be fed into the pureQuery API update statement in Listing 19.

To keep the example simple, I wrote some basic code that creates a textual XML representation of an Employee bean. Listing 17 shows the piece of code used to convert the Java beans into XML documents.

Listing 17. Simplistic mapping from Employee bean to XML document
private void recreateXMLDocument(){
     //create XML textual representation
     StringBuffer sb = new StringBuffer();
     sb.append("<employee>");
     sb.append("<name>");
     sb.append("<first>").append(this.getFirstname()).append("</first>");
     sb.append("<last>").append(this.getLastname()).append("</last>");
     sb.append("</name>");
     sb.append("<office>").append(this.getOffice()).append("</office>");
     sb.append("</employee>");
     this.setDoc(sb.toString());
}

The method recreateXMLDocument from Listing 17 will be executed every time the method getDoc from the Employee bean is called, as seen in the body of getDoc in Listing 18:

Listing 18. Triggering the generation of up-to-date XML document on getDoc
public String getDoc() {
     //reconstruct doc from contents of other class variables
     recreateXMLDocument();
     return doc;
}

Before getDoc actually returns the contents of the doc variable, recreateXMLDocument is executed and doc is populated with the new XML document built from the values of the other class variables.

Listing 19. Update method and statement for the in-the-bean mapping Employee data access object
public static void updateEmployee(Data data, Employee emp) {
     data.update("update employee set doc = :doc where id = :id", emp);
}

When pureQuery is setting the parameter values for the update statement in Listing 19, in order to pass the value for the parameter :doc, it calls the method getDoc from Listing 18, updating the doc column in the database table Employee with the XML representation of the emp variable passed as argument to the updateEmployee() method.

Once again, let's do some modifications to the sample application in order to showcase both the persisting and the fetching of XML into and from Java objects.

Figure 14. Modified sample application for in-the-bean mapping.
Modified sample application for in-the-bean mapping

By looking at the "Console" window in Figure 14, you can confirm that the mapping code is working properly, as the change to the employee last name is correctly persisted to the database.


Performance, performance, performance!

The ultimate goal of each and every application developer is to create an application with the best performance possible. As this article touches on two different technologies, pureXML and pureQuery, this section provides some performance tips for each one of them.

Boosting your database performance with pureXML

When it comes to databases, indexes are crucial in order to achieve good performance, which you probably already know. What you might not know yet is that DB2 pureXML provides an advanced mechanism of XML indexing that can be used to speed up both XQuery and SQL/XML types of queries. XML indexes are stored in the same index pages as relational indexes, thus providing similar access times when compared with relational indexes. Before diving into some examples of indexes that can and should be used in the application developed in this article, please note a couple of articles on pureXML performance and indexing:

Let's assume that you want to select all the employees with ids higher than 250 and last name "Doe". Since the employee id is stored in the id column, this filter is easy to implement by adding a range comparison in the where clause of the SQL statement. The employee's last name, however, is stored in one of the element nodes in the XML document stored in the doc column. With DB2 pureXML, filtering on the employee's last name is no big deal at all. DB2 provides the function XMLEXISTS(), part of the SQL/XML standard, that can be used to filter on XML columns. Better yet, using XMLEXISTS() allows you to take advantage of indexes in the XML column. Assuming that you want to do a full select on the table when your conditions are met, the SQL statement would look like this:

Listing 20. SQL statement that can benefit from indexes on both the relational and XML columns
SELECT * FROM EMPLOYEE WHERE ID > 250 AND
     XMLEXISTS('$DOC/employee[name/last = "Doe"]')

Since the column id is the primary column for the table, there is already a relational index created on it. To improve your query performance even more, you can also create an index on the XML column that will make searches on the employee's last name evaluate much faster. You can create an XML index using the regular SQL create index statement, adding some XML specific clauses, as shown in Listing 21:

Listing 21. CREATE INDEX statement that creates an index on the last name node value for the XML document
CREATE INDEX FNAME_IDX ON DB2ADMIN.EMPLOYEE(DOC)
     GENERATE KEY USING XMLPATTERN '/employee/name/last'
     AS SQL VARCHAR(25)

You will be glad to know that the index you just created can be used in any of the approaches described in this article to fetch data from the database into the business logic. Just create the indexes and add the where clause to your SQL select statements, and you are good to go.

For more tips on how to improve the performance of your pureXML storage, you should take a look at the article "15 best practices for pureXML performance in DB2" (developerWorks, January 2008).

Improving the performance of your Java application with pureQuery

You've seen a few tips on how to improve the performance of accessing pureXML data in your database. Now let's focus on the application performance.

pureQuery is all about improving the data access mechanism of your Java applications. There are a lot of important features provided by pureQuery, like superior and predictable performance based on the usage of static SQL, problem determination, optimization of existing Java data access applications, development tooling, and more.

You've already looked at a few bits and tips of the pureQuery tooling and API, and how they provide a considerable improvement in developer productivity.

This section focuses on increasing the performance of the data access in Java applications by deploying and running the example application as static SQL. pureQuery allows to switch between dynamic and static SQL at runtime, so developers can take advantage of dynamic SQL for quick and easy development, and then switch to static SQL to take advantage of the improved and predictable performance it offers.

Before you go any further, if you are not familiar with static SQL in DB2, you should read these two articles:

There are two ways that you can enable static SQL for your application using pureQuery:

  • Define your data access layer in pureQuery annotated interfaces, either using Java annotations or XML mapping file (it uses the same format as JPA XML mapping file).
  • Use the pureQuery client optimization capability. With client optimization, you can enable existing JDBC dynamic applications to use static SQL through pureQuery, with no change to the application code.

Let's use a pureQuery annotated interface to bind an existing data access as static SQL in a database. You first need to convert the existing EmployeeDAO from Approach 1, but similar annotated interfaces can be created for the other two approaches.

If you are interested in using the pureQuery client optimization feature, see the tutorial "Optimize your existing JDBC applications using pureQuery" (developerWorks, August 2008).

The pureQuery API introduces some annotations that can be used together with method declarations in Java interfaces to associate an SQL statement with that method. This representation allows the pureQuery tooling to bind this data access layer as static SQL in the database and to generate an implementation of this interface that will be able to run as either static or dynamic SQL at runtime. These annotations are @Call, @Select, and @Update, and their names reflect the type of SQL statements being executed.

It is straightforward to migrate the existing EmployeeDAO from Approach 1 into a pureQuery annotated method interface. You just need to copy the method signatures from the EmployeeDAO class and move the SQL statements from inside the method to the pureQuery annotation's body that precedes the Java method declaration:

Listing 22. Converting EmployeeDAO into an annotated method interface.
public interface EmployeeData extends Data {

@Select(sql = "SELECT id, X.* FROM employee, "
     + " XMLTABLE ('$d/employee' passing doc as \"d\" " + " COLUMNS "
     + " firstname     VARCHAR(20)     PATH 'name/first', "
     + " lastname      VARCHAR(20)     PATH 'name/last', "
     + " office        INTEGER         PATH 'office') AS X")
public List<Employee> getEmployees();

@Update(sql = "update employee set doc = xmlquery("
     + "'copy $new := $DOC "
     + "modify (     do replace value of $new/employee/name/first with $firstname,"
     + "             do replace value of $new/employee/name/last with $lastname,"
     + "             do replace value of $new/employee/office with $office)"
     + "return $new' "
     + "passing      cast(:firstname as varchar(20)) as \"firstname\", "
     + "             cast(:lastname as varchar(20)) as \"lastname\","
     + "             cast(:office as INTEGER) as \"office\"" + ") where id = :id")
public void updateEmployee(Employee emp);
}

As Listing 22 shows, the method's body has been removed, and the SQL statements moved to the sql parameter of the @Select and @Update annotations. Data Studio Developer includes pureQuery tooling that kicks off every time your Java project is being built and checks for the presence of annotated method interfaces. If such interfaces are found, a pureQuery code generator runs and generates an implementation class for each one of these interfaces, as shown in Figure 15:

Figure 15. Annotated style interface and automatically generated implementation class
Annotated style interface and automatically generated implementation class

The generated interface implements the data access defined in the interface by creating calls to the declared SQL statements, and implementing mappings between the SQL result set and the method return type, or between the method parameter(s) and the SQL statement parameter markers, if any.

To bind this data access layer as static SQL into the database, right-click on the file EmployeeData.java from the Package Explorer, and select the option pureQuery > Bind…, as illustrated in Figure 16:

Figure 16. Binding an annotated method interface.
Binding an annotated method interface

When the Bind process is executed, a message displays in the console window from Data Studio Developer indicating whether the bind succeeded or failed for the bound interface(s), as shown on Figure 17:

Figure 17. Result from bind operation
Result from bind operation

If a project contains multiple annotated method interfaces, you can bind each one at a time, or you can bind them all together — select the project, instead of an interface, right-click, and select the option pureQuery > Bind pureQuery Application.

To run the application in static mode, instead of dynamic, you must set a runtime flag to tell pureQuery to do so. This flag is named "pdq.executionMode" and can be passed to the SampleUtil class when requesting an implementation of the EmployeeData interface.

Figure 18. Executing the sample application using static SQL
Executing the sample application using static SQL

The code of Figure 18 is setting the runtime property pdq.executionMode to the value of STATIC and passing this property to the SampleUtil.getData method that will create an instance class of the method style interface declared in Listing 22. The remaining application is the same as the one used in the previous example, with the exception that when you run this application, all the database calls execute pre-compiled access paths for the SQL statements being executed.

In addition to taking advantage of static SQL in pureQuery, you should also be aware of several pureQuery best practices. Refer to the article "pureQuery Best Practices" (developerWorks, August 2008) before you start developing your pureQuery application. These best practices are very simple to use, and can provide you with considerable gains in performance and memory usage of your application.


Conclusion

This article described three different paradigms to join pureQuery and pureXML in the same solution. There may be other alternative ways of achieving this same goal, but the ones proposed in this article are the most common paradigms to merge these two technologies together.

Deciding on which approach to use will depend on several factors. The following tables list some pros and cons of each approach, and may help you when you have to decide:

Table 1. Fetching data from the database
ApproachProsCons
Give control to the SQL layer (using XMLTABLE function)
  • It takes advantage of existing pureQuery mapping facilities that implement object-table mapping, reducing development effort.
  • Only required fields are fetched, reducing I/O.
  • SQL/XML statement can become too long and complex as number of required fields grow.
  • A change in requirements means you have to update both the SQL/XML statement and the bean.
Give control to the data access API (using J2SE V6 mapping libraries)
  • Application and SQL layers are oblivious to mapping. A change in requirements will not affect those layers.
  • It provides tooling to streamline the adoption of a change in requirements by generating new beans and mapping code.
  • Less effort to developer than the other two options. SQL statements are simpler, and mapping is done by the framework.
  • Java objects hierarchy can become overly complex for large XSD files.
  • Fetches full document, maximizing I/O. This can lead to runtime performance concerns.
Give control to the application (using custom in-the-bean mapping)
  • If you need to fetch more or fewer fields, only the in-the-bean mapping needs to change. The data access API and SQL layer will be oblivious to this change.
  • Easy to plug in multiple beans with different internal mappings for the same SQL statement, enabling you to create a bean to fulfill a requirement or group of requirements.
  • Implementation of the mapping requires extra effort.
  • Fetches full document, maximizing I/O. This can lead to runtime performance concerns.
Table 2. Persisting the data into the database
ApproachProsCons
Give control to the SQL layer (using XQuery transform expressions)
  • Updates only required fields, minimizing I/O.
  • Can update several node values in single statement.
  • Updating large number of nodes requires complex XQuery statement.
  • Identifying which node to update in a sequence may involve complex logic.
Give control to the data access API (using J2SE V6 mapping libraries)
  • Marshalling to XML is done automatically.
  • Change in requirements is accommodated by running tools to generate new beans and mapping code.
  • Full document update, maximizes I/O.
Give control to the application (using custom in-the-bean mapping)
  • Data access layer is oblivious to mapping.
  • Can add logic to avoid recreating XML document if no changes have been made.
  • Implementation of the serialization requires extra effort from developer.
  • If all fields were not extracted from original XML, mapping needs to be carefully designed to guarantee that no data fetched from database is lost upon update.
  • Full document update, maximizes I/O.

After just a few years in the market, DB2 pureXML support has been a leading technology in the semi-structured data management world, used by a variety of customers to power their mission-critical systems. Learn more about DB2 pureXML success stories by visiting the DB2 pureXML Web site (see Resources).

In less than a year since its debut, pureQuery has proven to be essential in optimizing Java data access to both distributed and mainframe DB2 servers, mostly due to its seamless support of static SQL and enforcement of JDBC best practices.

The goal of this article was to bring these two technologies together and enable you to use them in your environment, to further boost your solution's performance, security, manageability, and problem determination, giving you a greater return on investment.

Two of the most common and frequent questions from our customers are: how do pureXML and pureQuery fit together? And how can I use pureXML with pureQuery? This article attempted to answer these questions, explaining the differences between these two technologies as well as how they can be integrated together.


About the sample code

In the Downloads section, you can find the Data Studio Developer project (it is a Java project) used to develop all the code examples used in this article.

The .zip file also contains the sample XML documents used, as well as the XML schema file. There is also an SQL script to create the Employee table.

The .zip file does not include the DB2 JDBC drivers and pureQuery .jar files. If you have pureQuery support in your Data studio Developer, they will be added automatically once you add pureQuery support to your project (to do so, right-click on the project, and select pureQuery > Add pureQuery support).

In order to run the examples in "Give control to the data access API" approach and the "Give control to the application layer" approach, you should install J2SE V6 on your system and use it as the project's Java runtime environment.


Acknowledgement

For their reviews of this article, I would like to thank Bryan Patterson and Cindy Saracco from the DB2 pureXML Enablement team, Kathy Zeidenstein from the Data Studio Enablement team, and Vladimir Blackvanski from InferData.


Download

DescriptionNameSize
Data Studio project with sample codepdq_xml_dsd_project.zip62KB

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML, Java technology
ArticleID=362769
ArticleTitle=Handle pureXML data in Java applications with pureQuery
publish-date=01082009