31 Mar 2010 - Added links to three videos at beginning of Resources.
04 Oct 2010 - Added Part 2 of series at beginning of Resources.
An example application: The blog checker with database integration
Consider the following Web application.
Figure 1. Blog checker with database integration
This Web application works with blog data exposed over blog Web services. The data, containing information about all blogs owned by a blogger and all comments on these blogs, is returned in Atom XML form (see Listing 1 for an example). The Web application allows a blogger to quickly query the comments across his blogs and delete comments that contain questionable content. The Web application displays the data in XHTML Web pages and forms. Given that the data is in XML format at the source and given that the browser’s data is also XML (HTML or XHTML), it is natural to work with the XML data natively.
Listing 1. Atom XML comment feed example
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">WebSphere Community Blog</title>
...
<entry>
<id>tag:blogger.com,1999:blog-1417695962027703953.post-6498982274841848264</id>
<published>2009-10-17T13:06:00.000-05:00</published>
<updated>2009-10-17T13:06:00.000-05:00</updated>
<atom:title xmlns="" xmlns:atom="http://www.w3.org/2005/Atom" type="text">
Questionable spamming comment title
</atom:title>
<atom:content xmlns="" xmlns:atom="http://www.w3.org/2005/Atom" type="html">
Questionable spamming comment content
</atom:content>
...
<atom:author xmlns="" xmlns:atom="http://www.w3.org/2005/Atom">
<atom:name>Joe Smith</atom:name>
<atom:uri>http://joe.uri.com</atom:uri>
<atom:email>jsmith@email.com</atom:email>
</atom:author>
</entry>
...
</feed> |
Over time, the blogger might notice that the questionable comments originate from the same users or from the same domain name (as seen in the atom:author element in Listing 1).
At this point, the application is updated to allow the blogger, when deleting a comment, to flag a user or domain as a spammer. This information isn’t stored in the original comment Web service, so it needs to be persisted to a data store. As the data is already in XML format, it’s natural to persist the data to an XML database. This allows future usage of the application to report statistics on specific spammers and make recommendations on which comments to delete.
What is the best way to implement the application?
In the past, before databases supported XML natively, there were two typical ways to work with XML data. First, you could serialize the data to a string and then store it in the database as a character large object (CLOB). This approach has performance issues and doesn’t allow the data to be queried inside of the database as XML.
Second, you could shred the data to map to relational tables that roughly represent the structure of the XML data. This approach has fidelity issues due to the mismatch of how relational data and XML data are represented, requires the user to maintain code for the mapping, and again does not allow for native XML query. Furthermore, as new requirements drive changes in the XML data schema, changing the relational mapping is typically too painful.
Given the growth of XML in the enterprise, databases have now started to store not only relational, but also XML data. For XML, the database stores the data natively in XML columns. Some databases that support XML columns are Apache Derby, IBM DB2®, Oracle Database, and Microsoft® SQLServer.
Also, getting data from the middle tier to the database was previously challenging. Before JDBC 4.0, the only options were to use the String or CLOB data types. As noted before, these types cause performance problems as you need to serialize the data. Also, they typically required non-standard SQL extensions to understand how to parse the data into a XML column. JDBC 4.0 introduced standardized support for the SQLXML type which allows for XML to be read from and written to the database in XML form. JDBC 4.0 allows the XML data to be accessed through strings, reader and writer streams, or JAXP sources and results. This support in JDBC 4.0 means that XML can natively flow across the middle tier and the database without unnecessary mapping or performance overhead in both the database and the middle tier.
A simple example of using JDBC 4.0 to get to XML data
Let's walk through an example using the JDBC 4.0 support. Reading XML data in JDBC 4.0 is similar to working with other data types.
The basic steps for reading XML data are:
- Create a prepared statement.
- Execute the prepared statement obtaining a result set.
- Get an SQLXML object from the result set.
- Read the SQLXML object through one of the supported get methods.
- Free the SQLXML object.
Here is a very basic example:
Listing 2. Reading XML data with JDBC 4.0
PreparedStatement ps =
dbConnection.prepareStatement("SELECT somexmlcolumn FROM somexmltable");
ResultSet result = ps.executeQuery();
result.next();
SQLXML xml = result.getSQLXML("somexmlcolumn");
StreamSource source = xml.getSource(StreamSource.class);
// Read from the stream source
xml.free(); |
In this example, we read the source through a JAXP source object, specifically the
StreamSource. The source object allows any APIs that understand JAXP sources to read the data. Using the StreamSource source allows you to use whatever in-memory representation you want. Typically, using streams is most efficient as compared with sources like DOM or SAX that incur object and API call overhead.
Similarly, the steps to write XML data through JDBC 4.0 are:
- Create a prepared statement.
- Create an SQLXML object from the connection.
- Get access to the SQLXML object through one of the supported set methods.
- Set the SQLXML object into the statement parameters.
- Write to the access method of the SQLXML object.
- Execute the statement.
- Free the SQLXML object.
Here is a very basic example:
Listing 3. Writing XML data with JDBC 4.0
PreparedStatement ps = dbConnection.prepareStatement(
"UPDATE somexmltable SET somexmlcolumn = ?");
SQLXML xml = dbConnection.createSQLXML();
StreamResult result = new StreamResult(xml.setBinaryStream());
ps.setSQLXML(1, xml);
// Write to the stream result
ps.executeUpdate();
xml.free(); |
As described above, the value of using an XML database (as opposed to CLOBs or shredding into relational data) is increased performance, XML fidelity, and simplified development. Performance is better as the data is kept in a single format instead of copying to different data models which in the worse case requires extra serialization and parsing. XML fidelity means that the original unchanged version of the XML data is retained in the database instead of approximately reconstructed from relational tables that are used to hold the XML data. Development is simplified by the fact that no mapping code from relational to XML is required.
Now let's consider the middle tier. By using XML natively in the middle tier, those same advantages are achieved in the middle tier and more importantly across both the middle tier and the database. Also, as the middle tier connects to more sources than the XML database, the value can be extended across other XML data sources, such as Web service feeds.
End-to-end performance is improved as there are no copies of the data made between the database (or other XML data source) and the middle tier. Performance at each tier is also improved by the fact that each tier can create the highest performance representation of the XML in memory while still offering W3C standard programming models on top of the data representation.
Fidelity is also maintained. In this simple example, XML fidelity might be a minor issue. However, when you consider the XML documents being exchanged might be something as complex as a tax form or as critical as financial reporting you can see the importance of ensuring that the XML data stays in its native form across all processing frameworks.
Finally, simplicity is achieved by using a single data model across a consistent set of XML programming models. While it might be easier for a Java™ programmer in the middle tier to map the objects to DOM or JAXB for simple scenarios, these scenarios wouldn’t be simple on the previously mentioned tax form documents. Also, the user would need to learn not only JDBC and XML, but also other XML programming models and query models. By working with the XML data under collections and results, the programmer only has to know the XML data model and the W3C standards for navigating, transforming, and querying the XML data. Given that XPath and XQuery are already used within the XML database for navigation and query, these skills are likely already known.
This simplicity is most easily achieved by using a framework that handles much of the low-level work of retrieving data from the data source, manipulating it, and putting things back into the database—without the overhead of making copies. To do this in our sample application, we will use the IBM WebSphere Application Server V7.0 Feature Pack for XML. Given that the XML Feature Pack can handle the XML in its original form from the database, there is no unnecessary performance overhead. Also, given that the data stays in XML format, the XML Feature Pack can easily navigate, transform, or query the data.
The IBM WebSphere Application Server V7.0 Feature Pack for XML
The IBM WebSphere Application Server V7.0 Feature Pack for XML introduces support for native XML navigation, transformation, and query using the W3C standard XPath 2.0, XSLT 2.0, and XQuery 1.0 programming models in the middle tier. Given the popularity of XML data in the database tier, we want to apply the above non-product specific JDBC 4.0 approach to the XML Feature Pack with an XML database such as IBM DB2 with pureXML®, Apache Derby, or the Oracle Database in the following basic form:
Figure 2. Simple topology diagram of XML Feature Pack and XML database
You'll see how an XML database, JDBC 4.0 support for SQLXML, and the XML Feature Pack support XML data in the database and the middle tier.
The scenario is available as a sample with source code in the XML Feature Pack to enable you to follow along and experiment within your own applications. You can find links to download the XML Feature Pack and Derby as well as DB2 Express in Resources.
The combination of native XML support in the database, JDBC 4.0, and the XML Feature Pack enable a straight forward and performant architecture.
The implementation of the sample application
Let's see how you can implement this application using the XML Feature Pack, JDBC 4.0, and an XML database. First, you retrieve the bad comments from the blog using the blog Web service which returns an Atom feed (see Figure 1). As you process those bad comments, you will check historical information from the database to see if the comment came from previous bad bloggers (spammers). All along the way, the data remains in XML. Let's begin by retrieving the information about blog comments from the Atom feed.
Within the XML Feature Pack, the data from the Atom feed is loaded through an HTTP connection to the blogger Web service and used as the input document to an XQuery program. The XQuery program is executed on the XML Feature Pack runtime invoked using the XML Feature Pack Java API.
You retrieve the questionable comments from the Atom feed with the following XPath statement in the XQuery program:
Listing 4. XPath statement
declare variable $comments := ( /atom:feed/atom:entry[atom:author/atom:name = 'Anonymous'] | /atom:feed/atom:entry[matches(atom:content, $my:vulgarwords, 'i')]) [atom:published > current-dateTime() - $my:monthsAgo]; |
Using XPath, you look
for all comments within a user-defined time window that are posted by Anonymous users
or that contain vulgar words. You store all of the entries that match these criteria into the variable comments.
Now, in XQuery you’d like to see if the spammer is listed in the database as having spammed before (where $i is a subquery of the above comments).
Listing 5. XQuery statement
let $spammedbefore := local:hasEmailHasSpammedBefore($i/atom:author/atom:email/text()) |
Looking at the definition of this function, you’ll see how to load XML data from the database.
Listing 6. XQuery function
declare function local:hasEmailHasSpammedBefore($emailaddress) as xs:boolean {
let $domainName := substring-after($emailaddress, '@')
return
if ($domainName = '') then
false()
else
let $jdbcURI := concat('jdbc://getAuthorsWhoHaveSpammedFromDomain?', $domainName)
let $domainSpammers := collection($jdbcURI)
return
not(empty($domainSpammers/spammers/spammer/email[. eq $emailaddress]))
}; |
We get the domain name from the e-mail address and then concatenate the value with a named query jdbc://getAuthorsWhoHaveSpammedFromDomain, resulting in an XPath 2.0 collection URI of jdbc://getAuthorsWhoHaveSpammedFromDomain?DOMAINNAME.
XPath 2.0 collections are an easy way to integrate sequences of XML data that do not come from the main input document of an XQuery or XSLT program. The implementation of the collection function is implementation defined and dependent, which means each XPath 2.0 runtime can provide their own default collection resolvers or allow users to extend the runtime to provide collection implementations dynamically. The XML Feature Pack runtime allows users to provide their own collection implementations by way of the XCollectionResolver interface.
In this sample, we implemented an XCollectionResolver which handles all collections that start with a jdbc:// URI scheme and resolves the rest of the URI against a set of previously defined named queries. This collection resolver looks up the named query as provided by the user, attaches the position parameters, and executes JDBC statements against the database.
In Listing 7, you can see where we instantiate this resolver after defining some basic named queries. We also pass a database connection to the resolver, so the resolver can work against any JDBC connection.
Listing 7. Resolver configuration
dbStatementsSupportsSQLXML = new HashMap<String, String>();
dbStatementsSupportsSQLXML.put("getAuthorsWhoHaveSpammedFromDomain",
"SELECT CONTACTS from SPAMMERS where DOMAINNAME = ?");
dbStatementsSupportsSQLXML.put("updateAuthorsWhoHaveSpammedByDomain",
"UPDATE SPAMMERS SET CONTACTS = ? WHERE DOMAINNAME = ?");
dbStatementsSupportsSQLXML.put("insertAuthorsWhoHaveSpammedByDomain",
"INSERT INTO SPAMMERS (CONTACTS, DOMAINNAME) VALUES (?, ?)");
Connection conn = getDatabaseConnection();
JDBCCollectionResolver inputResolver =
new JDBCCollectionResolver(conn, dbStatementsSupportsSQLXML); |
In Listing 8, you can see parts of the resolver implementation. For the full implementation look at the XML Feature Pack sample source code. This resolver is basically the same as the simpler sample shown earlier of reading XML data through JDBC 4.0.
The resolver, in order to be more re-usable, adds the following:
- The resolver looks for externally supplied named queries, instead of directly hard coding the SQL statements.
- The resolver looks at the metadata type on the returned row and column, ensuring it only reads XML columns with the SQLXML type.
- Finally, the resolver uses two other constructs of the XML Feature Pack API (
XSequenceCursorandXItemView) to construct a sequence of the XML data returned from the JDBC query.
Listing 8. Resolver implementation
public XSequenceCursor getCollection(String uri, String base) {
// look up query from query collection provided from Listing 7
String query = lookupNamedQuery(uri);
PreparedStatement p = dbConnection.prepareStatement(query);
ResultSet rs = p.executeQuery();
...
// Loop through the result returned from the query
ResultSetMetaData metadata = rs.getMetaData();
int colType = metadata.getColumnType(jj+1);
if (colType = Types.SQLXML) {
SQLXML sqlx = rs.getSQLXML(...);
StreamSource source = sqlx.getSource(StreamSource.class);
XItemView item = itemFactory.item(source);
sqlx.free();
}
// Use the XML Feature Pack API to create a sequence from the returned XML data
...
XItemView itemView[] = items.toArray(new XItemView[0]);
XSequenceCursor sequence = itemFact.sequence(itemView);
return sequence;
} |
At this point, we have a fairly general collection resolver to be used in any XQuery program that accepts any number of inputs and returns collections of XML data to be processed through XPath 2.0, XSLT 2.0, or XQuery 1.0. Here are two more examples. In the first, we create a list of all spammers using the name element; in the second, we return the number of spammers with more than ten posts.
Listing 9. More XQuery examples
Java:
dbStatementsSupportsSQLXML.put("getAllSpammers", "SELECT CONTACTS from SPAMMERS");
XQuery:
let $allSpammers := collection('jdbc://getAllSpammers')
return
for $i in $allspammers
let $first := $i/name/first
order by $i/name/last
return
<name>
<first>{ $first }</first>
<last>{ $i/name/last }</last>
</name>
Java:
dbStatementsSupportsSQLXML.put("getAllSpamAuthorsWhereSpamCountGreaterThan",
"SELECT CONTACTS from SPAMMERS where COUNT > ?");
XQuery:
let $minCount := 10
let $allSpammers := collection(concat(
'jdbc:// getAllSpamAuthorsWhereSpamCountGreaterThan?',
$minCount)
)
return
count($allSpammers)
} |
Also, given that the collection resolver is part of XPath 2.0, it is equally as useful to XSLT 2.0 as it is to XQuery 1.0. Here is a simple XSLT 2.0 example using the collection resolver:
Listing 10. More XSLT examples
<xsl:variable name="allSpammers" select="collection('jdbc://getAllSpammers')"/>
<xsl:template match="/">
<p>The current spammer database contains the following domains and spammers.</p>
<xsl:for-each select="$allSpammers">
<table>
<tr>
<th>Name</th><th>Email</th>
</tr>
<xsl:for-each select="$allSpammers">
<tr>
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="email"/></td>
</tr>
</xsl:for-each>
</table>
</xsl:template> |
Similarly, writing data to an XML database is possible. XSLT 2.0 allows for results to be written to multiple documents as identified by a URI using the xsl:result-document instruction. As before, the resolution of this URI is runtime implementation dependent. In order to allow users to specify where results are written, the XML Feature Pack provides the XResultsResolver interface. In this sample, we implemented a similar JDBC results resolver that uses named statements and positional parameters with the XML parameter denoted a sentinel value of —XML—.
This allows writing data to the database as in Listing 11:
Listing 11. XSLT result-document
<!-- Is this an insert or an update -->
<xsl:variable name="insert" select="count($spammersByDomain/spammers/spammer) eq 0"/>
<!-- Create the insert statement named query -->
<xsl:variable name="insertJdbcURI"
select="concat('jdbc://insertAuthorsWhoHaveSpammedByDomain?--XML--&', $domain)"/>
<!-- Create the update statement named query -->
<xsl:variable name="updateJdbcURI"
select="concat('jdbc://updateAuthorsWhoHaveSpammedByDomain?--XML--&', $domain)"/>
<!—
If insert, insert the xmldoc into the database.
Otherwise, update the database with the xmldoc.
-->
<xsl:template match="/">
<xsl:when test="$insert">
<xsl:result-document href="{$insertJdbcURI}" method="xml" indent="yes">
<xsl:copy-of select="$xmldoc"/>
</xsl:result-document>
</xsl:when>
<xsl:otherwise>
<xsl:result-document href="{$updateJdbcURI}" method="xml" indent="yes">
<xsl:copy-of select="$xmldoc"/>
</xsl:result-document>
</xsl:otherwise>
</xsl:template> |
When you execute this in the case where the spammer is new (no spammer exists for a particular domain), the insert path of the xsl:when instruction will execute. In that case, the XML runtime will see a request to write the contents of the variable xmldoc into a result URI of:
Listing 12. Result URI
jdbc://insertAuthorsWhoHaveSpammedByDomain?--XML--&domain.com |
The sample’s result resolver will then resolve this statement to a named query which we predefined as:
Listing 13. Named query
INSERT INTO SPAMMERS (CONTACTS, DOMAINNAME) VALUES (?, ?) |
The sample’s result resolver will then attach the XML contents of the variable xmldoc to the first positional parameter and domain.com into the second positional parameter.
As with reading XML data with collections, writing XML data with results is not unique to one of the XML languages. In the previous example of writing, XSLT was used for the example (see Listing11). With the same results resolver approach, you can direct an XQuery 1.0 program to write output to an XML database.
A note about using a generic JDBC Resolver versus other approaches
The approach described here, defining JDBC input and output collection and result resolvers in the XML Feature Pack that are implemented by JDBC 4.0, is only one way to implement a natural and performant integration of the XML Feature Pack and an XML database.
Alternatively, you might implement the input (collection resolver) as a user defined XPath 2.0 extension function that was more specific like:
Listing 14. Less general XQuery function
declare function local:hasEmailHasSpammedBefore($emailaddress) as xs:boolean {
let $domainName := substring-after($emailaddress, '@')
return
if ($domainName = '') then
false()
else
let $domainSpammers := my:getAuthorsWhoHaveSpammerFromDomain($domainName)
return
not(empty($domainSpammers/spammers/spammer/email[. eq $emailaddress]))
}; |
This approach has the benefit that the input and output are more tightly defined. This approach has the drawbacks of having to write an extension function for every query (versus one for all possible queries) and only works for input (writing an extension function for output isn’t natural and possible in all cases). Using the more general collection and results resolver approach handles most cases in a natural and consistent way.
It is worth noting that the collection resolver and the result resolver discussed are part of the XML Feature Pack sample code. Therefore the code presented is a sample. In order to use this code in a production application, we would expect you to extend the sample for your application’s needs.
This article purposefully simplified the XML programs in order to have an easy-to-understand article. As a reminder the sample can be run by using the XML Feature Pack samples and all source code (in full) is included.
As described in the historical overview, the value of using an XML database (as opposed to CLOBs or shredding into relational data) is increased performance, XML fidelity, and simplified development. Performance is better as the data is kept in a single format instead of copying to different data models which in the worse case requires extra serialization and parsing. XML fidelity means that the original unchanged version of the XML data is retained in the database instead of approximately reconstructed from relational tables that are used to hold the XML data. Development is simplified by the fact that no mapping code from relational data to XML data is required.
By using XML natively in the middle tier (using the XML Feature Pack), you achieve those same advantages in the middle tier and more importantly across both the middle tier and the database. Also, as the middle tier connects to more sources than the XML database, you can extend the value across other XML data sources, such as Web service feeds.
Learn
- Programming XML Across the Multiple Tiers, Part 2: Write efficient Java
EE applications that exploit an XML database server: Optimize an all-XML solution with JDBC 4.0,
SQLXML, the WebSphere XML Feature Pack, and DB2 pureXML (Andrew Spyker, Cynthia M. Saracco, Bert Van Der Linden, and Guogen Zhang, developerworks, October 2010): Dive in and work with transient and persistent XML in a server-side Java application in Part two of the series. Using practical examples and sample code, you'll see how to use XML indexing and query filtering capabilities in an XML database to process large amounts of XML data.
- Video: Getting started with the WebSphere Application Server Feature Pack for XML: Install the sample application described in this article and get started with the WebSphere Application Server Feature Pack for XML.
- Video: XML Feature Pack and Sample for XML Database Integration Setup (With DB2 pureXML) Part 1 of 2: Configure WebSphere Application Server and DB2 in order to run the blog checker sample application described in this article.
- Video: XML Feature Pack and Sample for XML Database Integration Setup (With Apache Derby) Part 2 of 2: Configure WebSphere Application Server and Apache Derby in order to run the blog checker sample application described in this article.
- WebSphere Application Server V7.0 Feature Pack for XML: Learn more about simplified development of XML-based and document-centric applications.
- WebSphere Application Server V7.0 Feature Pack for XML Information Center: Find complete product documentation.
- Documentation for sample application: Get installation documentation, source code, and instructions to help you to use the sample application included in the feature pack.
- The SQLXML type: Refer to the complete class documentation for details.
- XPath 2.0 collection: Find a description of the collection function at the W3C Web site.
- WebSphere Community Blog: Check out this list of links for the WebSphere Application Server Feature Pack for XML.
- JDBC 4.0 SQLXML Interface: See the documentation.
- XML area on developerWorks: Get the resources you need to advance your skills in the XML arena.
- IBM XML certification: Find out how you can become an IBM-Certified Developer in XML and related technologies.
- XML technical library: See the developerWorks XML Zone for a wide range of technical articles and tips, tutorials, standards, and IBM Redbooks.
- developerWorks technical events and webcasts: Stay current with technology in these sessions.
- developerWorks on Twitter: Join today to follow developerWorks tweets.
- developerWorks
podcasts: Listen to interesting interviews and discussions for software developers.
Get products and technologies
- Download the software to build the sample Web application for this article:
- WebSphere Application Server V7.0 Feature Pack for XML: Updated XML programming model standards that improve developer productivity and address new application development scenarios.
- DB2 Express-C: A flexible full-function relational and XML data server
- Apache Derby: An open source relational database implemented entirely in Java technology
- IBM product evaluation versions: Download or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- XML zone discussion forums: Participate in any of several XML-related discussions.
- developerWorks blogs: Check out these blogs and get involved.

As a Senior Technical Staff Member (STSM) in the WebSphere Application Server development teams, Andrew Spyker focuses on three major areas. Using his five years of experience leading the WebSphere Application Server performance team, he advises the performance team. As an SOA runtime architect, he owns driving consistency across the WebSphere SOA runtimes of which he mainly focuses on benchmarking strategy, performance, and XML consistency. Finally, with most of his time, he owns the creation and driving of the XML strategy of the WebSphere portfolio. Most recently he acted as chief architect for the WebSphere Application Server XML Feature Pack.

Bert Van Der Linden joined IBM in 2001 to architect pureXML in DB2 which culminated in the release of DB2 9 in 2006. In parallel, he engaged with many customers and partners to evangelize XML in the database and continues to work intensely with customers. Bert came to IBM from a startup company, Propel, where he led the design and implementation of the distributed and fault-tolerant middleware which hosted a scalable e-commerce application. Before that Bert worked for many years at Tandem Computers on the NonStop SQL, a database that ran many critical applications in the financial industry.




