Programming XML across the multiple tiers: Use XML in the middle tier for performance, fidelity, and development ease

Develop an all-XML solution using JDBC 4.0, SQLXML, and the WebSphere Server XML Feature Pack

In this article, explore a natural and performant approach to working with XML data in the database and the middle tier. The sample Web application combines XML data across an XML database and Atom services to explain the approach. You will build such an application using an XML database, JDBC 4.0 support for SQLXML, and the IBM® WebSphere® Application Server V7.0 Feature Pack for XML.

Share:

Andrew Spyker, STSM, WebSphere XML, SOA, Performance Architect, IBM

Photo of Andrew SpykerAs a Senior Technical Staff Member (STSM) in the WebSphere Application Server development teams, Andrew Spyker focuses on three major areas. Using his five years of experience leading the WebSphere Application Server performance team, he advises the performance team. As an SOA runtime architect, he owns driving consistency across the WebSphere SOA runtimes of which he mainly focuses on benchmarking strategy, performance, and XML consistency. Finally, with most of his time, he owns the creation and driving of the XML strategy of the WebSphere portfolio. Most recently he acted as chief architect for the WebSphere Application Server XML Feature Pack.



Bert Van Der Linden, DB2 XML Architect, IBM

Photo of Bert Van Der LindenBert Van Der Linden joined IBM in 2001 to architect pureXML in DB2 which culminated in the release of DB2 9 in 2006. In parallel, he engaged with many customers and partners to evangelize XML in the database and continues to work intensely with customers. Bert came to IBM from a startup company, Propel, where he led the design and implementation of the distributed and fault-tolerant middleware which hosted a scalable e-commerce application. Before that Bert worked for many years at Tandem Computers on the NonStop SQL, a database that ran many critical applications in the financial industry.



05 October 2010 (First published 30 March 2010)

Also available in Chinese Russian Japanese Portuguese Spanish

31 Mar 2010 - Added links to three videos at beginning of Resources.

04 Oct 2010 - Added Part 2 of series at beginning of Resources.

An example application: The blog checker with database integration

Consider the following Web application.

Figure 1. Blog checker with database integration
Diagram of blog checker with database integration

This Web application works with blog data exposed over blog Web services. The data, containing information about all blogs owned by a blogger and all comments on these blogs, is returned in Atom XML form (see Listing 1 for an example). The Web application allows a blogger to quickly query the comments across his blogs and delete comments that contain questionable content. The Web application displays the data in XHTML Web pages and forms. Given that the data is in XML format at the source and given that the browser’s data is also XML (HTML or XHTML), it is natural to work with the XML data natively.

Listing 1. Atom XML comment feed example
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">WebSphere Community Blog</title>
  ...
  <entry>
    <id>tag:blogger.com,1999:blog-1417695962027703953.post-6498982274841848264</id>
    <published>2009-10-17T13:06:00.000-05:00</published>
    <updated>2009-10-17T13:06:00.000-05:00</updated>
    <atom:title xmlns="" xmlns:atom="http://www.w3.org/2005/Atom" type="text">
      Questionable spamming comment title
    </atom:title>
    <atom:content xmlns="" xmlns:atom="http://www.w3.org/2005/Atom" type="html">
      Questionable spamming comment content
    </atom:content>
    ...
    <atom:author xmlns="" xmlns:atom="http://www.w3.org/2005/Atom">
      <atom:name>Joe Smith</atom:name>
      <atom:uri>http://joe.uri.com</atom:uri>
      <atom:email>jsmith@email.com</atom:email>
    </atom:author>
  </entry>
  ...
</feed>

Over time, the blogger might notice that the questionable comments originate from the same users or from the same domain name (as seen in the atom:author element in Listing 1). At this point, the application is updated to allow the blogger, when deleting a comment, to flag a user or domain as a spammer. This information isn’t stored in the original comment Web service, so it needs to be persisted to a data store. As the data is already in XML format, it’s natural to persist the data to an XML database. This allows future usage of the application to report statistics on specific spammers and make recommendations on which comments to delete.

What is the best way to implement the application?

In the past, before databases supported XML natively, there were two typical ways to work with XML data. First, you could serialize the data to a string and then store it in the database as a character large object (CLOB). This approach has performance issues and doesn’t allow the data to be queried inside of the database as XML.

Frequently used acronyms

  • API: Application program interface
  • DOM: Document Object Model
  • HTML: Hypertext Markup Language
  • HTTP: Hypertext Transfer Protocol
  • JAXP: Java for XML Processing
  • JDBC: Java Database Connectivity
  • SAX: Simple API for XML
  • SOA: Service-oriented architecture
  • SQL: Structured Query Language
  • URI: Uniform Resource Identifier
  • URL: Uniform Resource Locator
  • W3C: World Wide Web Consortium
  • WSDL: Web Services Description Language
  • XHTML: Extensible Hypertext Markup Language
  • XML: Extensible Markup Language
  • XSLT: Extensible Stylesheet Language Transformations

Second, you could shred the data to map to relational tables that roughly represent the structure of the XML data. This approach has fidelity issues due to the mismatch of how relational data and XML data are represented, requires the user to maintain code for the mapping, and again does not allow for native XML query. Furthermore, as new requirements drive changes in the XML data schema, changing the relational mapping is typically too painful.

Given the growth of XML in the enterprise, databases have now started to store not only relational, but also XML data. For XML, the database stores the data natively in XML columns. Some databases that support XML columns are Apache Derby, IBM DB2®, Oracle Database, and Microsoft® SQLServer.

Also, getting data from the middle tier to the database was previously challenging. Before JDBC 4.0, the only options were to use the String or CLOB data types. As noted before, these types cause performance problems as you need to serialize the data. Also, they typically required non-standard SQL extensions to understand how to parse the data into a XML column. JDBC 4.0 introduced standardized support for the SQLXML type which allows for XML to be read from and written to the database in XML form. JDBC 4.0 allows the XML data to be accessed through strings, reader and writer streams, or JAXP sources and results. This support in JDBC 4.0 means that XML can natively flow across the middle tier and the database without unnecessary mapping or performance overhead in both the database and the middle tier.

A simple example of using JDBC 4.0 to get to XML data

Let's walk through an example using the JDBC 4.0 support. Reading XML data in JDBC 4.0 is similar to working with other data types.

The basic steps for reading XML data are:

  1. Create a prepared statement.
  2. Execute the prepared statement obtaining a result set.
  3. Get an SQLXML object from the result set.
  4. Read the SQLXML object through one of the supported get methods.
  5. Free the SQLXML object.

Here is a very basic example:

Listing 2. Reading XML data with JDBC 4.0
PreparedStatement ps =
        dbConnection.prepareStatement("SELECT somexmlcolumn FROM somexmltable");

ResultSet result = ps.executeQuery();

result.next();

SQLXML xml = result.getSQLXML("somexmlcolumn");

StreamSource source = xml.getSource(StreamSource.class);

// Read from the stream source

xml.free();

In this example, we read the source through a JAXP source object, specifically the StreamSource. The source object allows any APIs that understand JAXP sources to read the data. Using the StreamSource source allows you to use whatever in-memory representation you want. Typically, using streams is most efficient as compared with sources like DOM or SAX that incur object and API call overhead.

Similarly, the steps to write XML data through JDBC 4.0 are:

  1. Create a prepared statement.
  2. Create an SQLXML object from the connection.
  3. Get access to the SQLXML object through one of the supported set methods.
  4. Set the SQLXML object into the statement parameters.
  5. Write to the access method of the SQLXML object.
  6. Execute the statement.
  7. Free the SQLXML object.

Here is a very basic example:

Listing 3. Writing XML data with JDBC 4.0
PreparedStatement ps = dbConnection.prepareStatement(
        "UPDATE somexmltable SET somexmlcolumn = ?");

SQLXML xml = dbConnection.createSQLXML();

StreamResult result = new StreamResult(xml.setBinaryStream());

ps.setSQLXML(1, xml);

// Write to the stream result

ps.executeUpdate();

xml.free();

The middle tier

As described above, the value of using an XML database (as opposed to CLOBs or shredding into relational data) is increased performance, XML fidelity, and simplified development. Performance is better as the data is kept in a single format instead of copying to different data models which in the worse case requires extra serialization and parsing. XML fidelity means that the original unchanged version of the XML data is retained in the database instead of approximately reconstructed from relational tables that are used to hold the XML data. Development is simplified by the fact that no mapping code from relational to XML is required.

Now let's consider the middle tier. By using XML natively in the middle tier, those same advantages are achieved in the middle tier and more importantly across both the middle tier and the database. Also, as the middle tier connects to more sources than the XML database, the value can be extended across other XML data sources, such as Web service feeds.

End-to-end performance is improved as there are no copies of the data made between the database (or other XML data source) and the middle tier. Performance at each tier is also improved by the fact that each tier can create the highest performance representation of the XML in memory while still offering W3C standard programming models on top of the data representation.

Fidelity is also maintained. In this simple example, XML fidelity might be a minor issue. However, when you consider the XML documents being exchanged might be something as complex as a tax form or as critical as financial reporting you can see the importance of ensuring that the XML data stays in its native form across all processing frameworks.

Other frameworks

Note that other persistence frameworks build on top of JDBC that support some form of XML data in the database—typically by mapping it to an object representation. Historically these frameworks had the issues with performance and mapping to relational tables as described above in What is the best way to implement the application?. Even after these issues were fixed by XML columns and JDBC 4.0 SQLXML, these frameworks, when working against XML data natively, still have challenges. First, these frameworks typically map to an object representation which isn’t necessary in cases where a user wants to work with the XML data natively. Second, they do not allow XML centric navigation, transformation, or query once the data is mapped to objects.

Finally, simplicity is achieved by using a single data model across a consistent set of XML programming models. While it might be easier for a Java™ programmer in the middle tier to map the objects to DOM or JAXB for simple scenarios, these scenarios wouldn’t be simple on the previously mentioned tax form documents. Also, the user would need to learn not only JDBC and XML, but also other XML programming models and query models. By working with the XML data under collections and results, the programmer only has to know the XML data model and the W3C standards for navigating, transforming, and querying the XML data. Given that XPath and XQuery are already used within the XML database for navigation and query, these skills are likely already known.

This simplicity is most easily achieved by using a framework that handles much of the low-level work of retrieving data from the data source, manipulating it, and putting things back into the database—without the overhead of making copies. To do this in our sample application, we will use the IBM WebSphere Application Server V7.0 Feature Pack for XML. Given that the XML Feature Pack can handle the XML in its original form from the database, there is no unnecessary performance overhead. Also, given that the data stays in XML format, the XML Feature Pack can easily navigate, transform, or query the data.

The IBM WebSphere Application Server V7.0 Feature Pack for XML

The IBM WebSphere Application Server V7.0 Feature Pack for XML introduces support for native XML navigation, transformation, and query using the W3C standard XPath 2.0, XSLT 2.0, and XQuery 1.0 programming models in the middle tier. Given the popularity of XML data in the database tier, we want to apply the above non-product specific JDBC 4.0 approach to the XML Feature Pack with an XML database such as IBM DB2 with pureXML®, Apache Derby, or the Oracle Database in the following basic form:

Figure 2. Simple topology diagram of XML Feature Pack and XML database
Simple topology diagram of XML Feature Pack, a network, and XML database

You'll see how an XML database, JDBC 4.0 support for SQLXML, and the XML Feature Pack support XML data in the database and the middle tier.

The scenario is available as a sample with source code in the XML Feature Pack to enable you to follow along and experiment within your own applications. You can find links to download the XML Feature Pack and Derby as well as DB2 Express in Resources.

The combination of native XML support in the database, JDBC 4.0, and the XML Feature Pack enable a straight forward and performant architecture.

The implementation of the sample application

Let's see how you can implement this application using the XML Feature Pack, JDBC 4.0, and an XML database. First, you retrieve the bad comments from the blog using the blog Web service which returns an Atom feed (see Figure 1). As you process those bad comments, you will check historical information from the database to see if the comment came from previous bad bloggers (spammers). All along the way, the data remains in XML. Let's begin by retrieving the information about blog comments from the Atom feed.

Within the XML Feature Pack, the data from the Atom feed is loaded through an HTTP connection to the blogger Web service and used as the input document to an XQuery program. The XQuery program is executed on the XML Feature Pack runtime invoked using the XML Feature Pack Java API.

You retrieve the questionable comments from the Atom feed with the following XPath statement in the XQuery program:

Listing 4. XPath statement
declare variable $comments := (
  /atom:feed/atom:entry[atom:author/atom:name = 'Anonymous'] |
  /atom:feed/atom:entry[matches(atom:content, $my:vulgarwords, 'i')])
  [atom:published > current-dateTime() - $my:monthsAgo];

Using XPath, you look for all comments within a user-defined time window that are posted by Anonymous users or that contain vulgar words. You store all of the entries that match these criteria into the variable comments.

Now, in XQuery you’d like to see if the spammer is listed in the database as having spammed before (where $i is a subquery of the above comments).

Listing 5. XQuery statement
let $spammedbefore := local:hasEmailHasSpammedBefore($i/atom:author/atom:email/text())

Looking at the definition of this function, you’ll see how to load XML data from the database.

Listing 6. XQuery function
declare function local:hasEmailHasSpammedBefore($emailaddress) as xs:boolean {

let $domainName := substring-after($emailaddress, '@')

return
  if ($domainName = '') then
    false()
  else
    let $jdbcURI := concat('jdbc://getAuthorsWhoHaveSpammedFromDomain?', $domainName)
    let $domainSpammers := collection($jdbcURI)
    return
      not(empty($domainSpammers/spammers/spammer/email[. eq $emailaddress]))
};

We get the domain name from the e-mail address and then concatenate the value with a named query jdbc://getAuthorsWhoHaveSpammedFromDomain, resulting in an XPath 2.0 collection URI of jdbc://getAuthorsWhoHaveSpammedFromDomain?DOMAINNAME.

XPath 2.0 collections are an easy way to integrate sequences of XML data that do not come from the main input document of an XQuery or XSLT program. The implementation of the collection function is implementation defined and dependent, which means each XPath 2.0 runtime can provide their own default collection resolvers or allow users to extend the runtime to provide collection implementations dynamically. The XML Feature Pack runtime allows users to provide their own collection implementations by way of the XCollectionResolver interface.

In this sample, we implemented an XCollectionResolver which handles all collections that start with a jdbc:// URI scheme and resolves the rest of the URI against a set of previously defined named queries. This collection resolver looks up the named query as provided by the user, attaches the position parameters, and executes JDBC statements against the database.

In Listing 7, you can see where we instantiate this resolver after defining some basic named queries. We also pass a database connection to the resolver, so the resolver can work against any JDBC connection.

Listing 7. Resolver configuration
dbStatementsSupportsSQLXML = new HashMap<String, String>();

dbStatementsSupportsSQLXML.put("getAuthorsWhoHaveSpammedFromDomain",
        "SELECT CONTACTS from SPAMMERS where DOMAINNAME = ?");

dbStatementsSupportsSQLXML.put("updateAuthorsWhoHaveSpammedByDomain",
        "UPDATE SPAMMERS SET CONTACTS = ? WHERE DOMAINNAME = ?");

dbStatementsSupportsSQLXML.put("insertAuthorsWhoHaveSpammedByDomain",
        "INSERT INTO SPAMMERS (CONTACTS, DOMAINNAME) VALUES (?, ?)");

Connection conn = getDatabaseConnection();

JDBCCollectionResolver inputResolver =
  new JDBCCollectionResolver(conn, dbStatementsSupportsSQLXML);

In Listing 8, you can see parts of the resolver implementation. For the full implementation look at the XML Feature Pack sample source code. This resolver is basically the same as the simpler sample shown earlier of reading XML data through JDBC 4.0.

The resolver, in order to be more re-usable, adds the following:

  1. The resolver looks for externally supplied named queries, instead of directly hard coding the SQL statements.
  2. The resolver looks at the metadata type on the returned row and column, ensuring it only reads XML columns with the SQLXML type.
  3. Finally, the resolver uses two other constructs of the XML Feature Pack API (XSequenceCursor and XItemView) to construct a sequence of the XML data returned from the JDBC query.
Listing 8. Resolver implementation
public XSequenceCursor getCollection(String uri, String base) {
        // look up query from query collection provided from Listing 7
        String query = lookupNamedQuery(uri);
        PreparedStatement p = dbConnection.prepareStatement(query);
        ResultSet rs = p.executeQuery();
        ...

        // Loop through the result returned from the query
        ResultSetMetaData metadata = rs.getMetaData();
        int colType = metadata.getColumnType(jj+1);
        if (colType = Types.SQLXML) {
                SQLXML sqlx = rs.getSQLXML(...);
                StreamSource source = sqlx.getSource(StreamSource.class);
                XItemView item = itemFactory.item(source);
                sqlx.free();
        }

        // Use the XML Feature Pack API to create a sequence from the returned XML data
        ...
        XItemView itemView[] = items.toArray(new XItemView[0]);
        XSequenceCursor sequence = itemFact.sequence(itemView);

        return sequence;
}

At this point, we have a fairly general collection resolver to be used in any XQuery program that accepts any number of inputs and returns collections of XML data to be processed through XPath 2.0, XSLT 2.0, or XQuery 1.0. Here are two more examples. In the first, we create a list of all spammers using the name element; in the second, we return the number of spammers with more than ten posts.

Listing 9. More XQuery examples
Java:
  dbStatementsSupportsSQLXML.put("getAllSpammers", "SELECT CONTACTS from SPAMMERS");
XQuery:
  let $allSpammers := collection('jdbc://getAllSpammers')
  return
    for $i in $allspammers
    let $first := $i/name/first
    order by $i/name/last
    return
      <name>
        <first>{ $first }</first>
        <last>{ $i/name/last }</last>
      </name>

Java:
  dbStatementsSupportsSQLXML.put("getAllSpamAuthorsWhereSpamCountGreaterThan",
    "SELECT CONTACTS from SPAMMERS where COUNT > ?");
XQuery:
  let $minCount := 10
  let $allSpammers := collection(concat(
    'jdbc:// getAllSpamAuthorsWhereSpamCountGreaterThan?',
    $minCount)
  )
  return
    count($allSpammers)
}

Also, given that the collection resolver is part of XPath 2.0, it is equally as useful to XSLT 2.0 as it is to XQuery 1.0. Here is a simple XSLT 2.0 example using the collection resolver:

Listing 10. More XSLT examples
<xsl:variable name="allSpammers" select="collection('jdbc://getAllSpammers')"/>

<xsl:template match="/">
        <p>The current spammer database contains the following domains and spammers.</p>
        <xsl:for-each select="$allSpammers">

        <table>
        <tr>
                <th>Name</th><th>Email</th>
        </tr>
        <xsl:for-each select="$allSpammers">
        <tr>
                <td><xsl:value-of select="name"/></td>
                <td><xsl:value-of select="email"/></td>
        </tr>
        </xsl:for-each>
        </table>
</xsl:template>

Similarly, writing data to an XML database is possible. XSLT 2.0 allows for results to be written to multiple documents as identified by a URI using the xsl:result-document instruction. As before, the resolution of this URI is runtime implementation dependent. In order to allow users to specify where results are written, the XML Feature Pack provides the XResultsResolver interface. In this sample, we implemented a similar JDBC results resolver that uses named statements and positional parameters with the XML parameter denoted a sentinel value of —XML—.

This allows writing data to the database as in Listing 11:

Listing 11. XSLT result-document
<!--  Is this an insert or an update -->

<xsl:variable name="insert" select="count($spammersByDomain/spammers/spammer) eq 0"/>

<!-- Create the insert statement named query -->

<xsl:variable name="insertJdbcURI"
        select="concat('jdbc://insertAuthorsWhoHaveSpammedByDomain?--XML--&', $domain)"/>

<!-- Create the update statement named query -->

<xsl:variable name="updateJdbcURI"
        select="concat('jdbc://updateAuthorsWhoHaveSpammedByDomain?--XML--&', $domain)"/>

<!—
If insert, insert the xmldoc into the database.
Otherwise, update the database with the xmldoc.
 -->

<xsl:template match="/">
        <xsl:when test="$insert">
                <xsl:result-document href="{$insertJdbcURI}" method="xml" indent="yes"> 
                        <xsl:copy-of select="$xmldoc"/>
                </xsl:result-document>
        </xsl:when>
        <xsl:otherwise>
                <xsl:result-document href="{$updateJdbcURI}" method="xml" indent="yes">
                        <xsl:copy-of select="$xmldoc"/>
                </xsl:result-document>
        </xsl:otherwise>
</xsl:template>

When you execute this in the case where the spammer is new (no spammer exists for a particular domain), the insert path of the xsl:when instruction will execute. In that case, the XML runtime will see a request to write the contents of the variable xmldoc into a result URI of:

Listing 12. Result URI
jdbc://insertAuthorsWhoHaveSpammedByDomain?--XML--&domain.com

The sample’s result resolver will then resolve this statement to a named query which we predefined as:

Listing 13. Named query
INSERT INTO SPAMMERS (CONTACTS, DOMAINNAME) VALUES (?, ?)

The sample’s result resolver will then attach the XML contents of the variable xmldoc to the first positional parameter and domain.com into the second positional parameter.

As with reading XML data with collections, writing XML data with results is not unique to one of the XML languages. In the previous example of writing, XSLT was used for the example (see Listing11). With the same results resolver approach, you can direct an XQuery 1.0 program to write output to an XML database.

A note about using a generic JDBC Resolver versus other approaches

The approach described here, defining JDBC input and output collection and result resolvers in the XML Feature Pack that are implemented by JDBC 4.0, is only one way to implement a natural and performant integration of the XML Feature Pack and an XML database.

Alternatively, you might implement the input (collection resolver) as a user defined XPath 2.0 extension function that was more specific like:

Listing 14. Less general XQuery function
declare function local:hasEmailHasSpammedBefore($emailaddress) as xs:boolean {

let $domainName := substring-after($emailaddress, '@')

return
  if ($domainName = '') then
    false()
  else
    let $domainSpammers := my:getAuthorsWhoHaveSpammerFromDomain($domainName)
      return
        not(empty($domainSpammers/spammers/spammer/email[. eq $emailaddress]))
};

This approach has the benefit that the input and output are more tightly defined. This approach has the drawbacks of having to write an extension function for every query (versus one for all possible queries) and only works for input (writing an extension function for output isn’t natural and possible in all cases). Using the more general collection and results resolver approach handles most cases in a natural and consistent way.

It is worth noting that the collection resolver and the result resolver discussed are part of the XML Feature Pack sample code. Therefore the code presented is a sample. In order to use this code in a production application, we would expect you to extend the sample for your application’s needs.

This article purposefully simplified the XML programs in order to have an easy-to-understand article. As a reminder the sample can be run by using the XML Feature Pack samples and all source code (in full) is included.

Summary

As described in the historical overview, the value of using an XML database (as opposed to CLOBs or shredding into relational data) is increased performance, XML fidelity, and simplified development. Performance is better as the data is kept in a single format instead of copying to different data models which in the worse case requires extra serialization and parsing. XML fidelity means that the original unchanged version of the XML data is retained in the database instead of approximately reconstructed from relational tables that are used to hold the XML data. Development is simplified by the fact that no mapping code from relational data to XML data is required.

By using XML natively in the middle tier (using the XML Feature Pack), you achieve those same advantages in the middle tier and more importantly across both the middle tier and the database. Also, as the middle tier connects to more sources than the XML database, you can extend the value across other XML data sources, such as Web service feeds.

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Information Management, WebSphere
ArticleID=478069
ArticleTitle=Programming XML across the multiple tiers: Use XML in the middle tier for performance, fidelity, and development ease
publish-date=10052010