In my previous post I discussed the native XML capabilities provided by the IBM WebSphere Application Server V7.0 Feature Pack for XML. This Feature Pack allows you to manipulate XML in your Java application directly with XQuery, XPath, or XSLT.
If you use the WebSphere XML Feature Pack together with DB2 pureXML, one of the obvious questions is: when should I use XPath or XQuery in the application server and when should I push these operations to DB2 and run XQuery or SQL/XML in the database layer?
The best split between XML processing in one tier vs. the other certainly depends on the exact nature of your application and may vary on a case by case basis. Still, here are some general considerations and examples.
Assume your application needs to retrieve an XML document from the database based on one or multiple XML search conditions, but the decision about which information to extract from the document can only be made at runtime after the document has been retrieved. In this situation it is typically best to issue an XQuery or SQL/XML query to DB2 to retrieve the full document, exploiting DB2′s XML indexes to evaluate the search conditions efficiently. The extraction of specific elements is then performed with XPath or XQuery in the application server as part of the application’s business logic.
As a variation of Example 1, consider a hybrid database schema where some data is in relational columns and some in XML columns. Assume you need to retrieve XML documents based on relational and XML predicates, possibly involving joins between the table that holds the XML and other fully relational tables. It is typically recommended to use a database query that contains both relational and XML predicates to minimize the amount of data that is retrieved from the database and sent to the application. The extraction of XML elements would still happen in the application server, especially if it depends on post-retrieval application logic.
The application needs to retrieve specific elements from one or multiple XML documents, and these elements are fixed and known before the database query is submitted. In this case it is typically best to code the search and the extraction in a single XQuery or SQL/XML statement for DB2. Again, the general idea is to reduce data early and not ship more data around than necessary.
The application receives an XML document from a web service and needs to examine it with an XPath query before it can decide whether the document should be inserted into the database or passed to another service. Typically, this operation should be implemented in the application server instead of calling out to the database when it might be not needed.
The application receives a transient XML document from a web service and based on specific element values in that document the application needs to retrieve related documents from the DB2 database . Although you could pass the full input document as a parameter to a SQL/XML database query and perform the entire processing in DB2, it can be easier and more efficient to first extract the relevant values from the input document in the application layer and then send a simpler and more targeted query to DB2, for potentially better end-to-end performance.
The application needs to retrieve XML documents from the database and apply logic that has already been coded in XQuery modules and XQuery user-defined functions. Such modules are an optional feature of the XQuery language standard. Modules are supported in the WebSphere XML Feature Pack but not in DB2. Obviously, the choice is to retrieve the full documents from DB2 (similar to Examples 1 and 2) and then use the additional XQuery capabilities of the WebSphere XML Feature Pack on these documents.
These examples are certainly not an exhaustive of list potential scenarios. Also, they are not meant to be hard rules, but just illustrations of the trade-offs and considerations that can be involved in the decision process when you split XML processing between the application server and the database.
For concrete coding examples with the WebSphere XML Feature Pack and DB2 pureXML, see the following articles:
“Programming XML across the multiple tiers: Use XML in the middle tier for performance, fidelity, and development ease” – http://www.ibm.com/developerworks/library/x-xmlfeat1/
“Programming XML across the multiple tiers, Part 2: Write efficient Java EE applications that exploit an XML database server” – http://www.ibm.com/developerworks/xml/library/x-xmlfeat2/