The XML Database Blog
I have been asked many times: “What is faster, XML or Relational?”. Of course, this question oversimplifies a complex issue, and so the only valid answer is “It depends!”. Sometimes people ask the same question in a slightly different way: “If I have a relational table and convert each row into a small XML document and store these documents in a separate table with 1 XML column, what’s the performance difference between the two tables (for inserts/updates/queries)?”. But, in most cases such a conversion is not recommended and this type of comparison is, again, too simplistic.
Let’s say you want store, index, and query 1 million addresses and each address has a first name, last name, street, city, state, and zip code. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases have been optimized for decades to handle such fixed records very efficiently. However, if the application needs to convert the address data to XML format anyway, it can often be faster to store the data permanently in XML and avoid the repeated conversion from relational format to XML format in the application.
Now consider a scenario where the business objects of interest are a lot more complex and variable than simple addresses. For example, derivative trades in the financial industry are modeled in XML with an XML Schema called FpML (financial products markup language). It defines more than 7000 fields (many are optional) with hundreds of 1-to-many relationships between them. Designing a relational schema to represent such objects is very hard and leads to hundreds of tables. The process of inserting (normalizing) and reading (denormalizing) a single object into such a relational schema can easily be 10x or 100x slower than inserting and reading a corresponding XML document in a native XML column (e.g. in DB2 pureXML).
So, any performance comparison of XML versus relational depends heavily on which data you choose for the comparison, and what type of operations you measure.
A large DB2 customer recently compared XML to relational performance because their business objects are currently mapped to 12 relational tables. Their application executes at least 12 SQL statements to retrieve all the relational data that comprises one of the logical business objects. Then the application reassembles the original business object. An alternative is to store these business objects as XML so that each object is stored as a single document. Instead of 12 tables, only 1 table with 1 XML column is then needed. In multi-user tests for data retrieval the company found that the XML-based solution allows them to retrieve objects with 55% higher throughput than the existing relational SQL-based solution. The reasons for the performance benefit include fewer JDBC interactions between application and database as well as fewer distinct pages that need to be read when one logical business object is represented as one XML document - and not scattered over 12 tables! These tests were later repeated and verified at an IBM lab.
Another important consideration in the question of XML versus relational is the data format in which the data is produced and consumed outside the database. If the data is produced and/or consumed in XML format anyway, it is often better to also store the data as XML in the database.
So, the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at. And there are also use cases (with high schema complexity and schema variability over time) that cannot reasonably be implemented in a relational data model. (If you need to go to Hawaii, the boat always beats the truck!).
The beauty of a hybrid database system such as DB2 is that you can use both native XML and relational capabilitues side by side, in a tightly integrated manner. Some data is better represented in relational tables, other data is better represented in XML, and you can manage both in the same database or even the same table, and with the same APIs and utilities.
(This was one of the Top 5 most popular posts in 2010 on the nativexmldatabase.com blog.)
MatthiasNicola 120000E28R Tags:  expressions sql satisfies quantified xquery every db2 10,397 Views
If you are familiar with SQL you have probably seen quantified SQL predicates that compare a single value with a set of values. For example, the following SQL query returns all rows from table1 where col1 is greater than all values in col2 of table2.
And if you replace the keyword ALL with the keyword SOME, then the query above returns all rows from table1 where col1 is greater than at least one of the values in col2 of table2.
These comparisons are also known as universal quantification (“ALL”) and existential quantification (“SOME”), respectively.
Similar quantified comparisons are possible in XQuery too, but the syntax is slightly different. In XQuery you have the keywords “every … satisfies ….” for universal quantification and “some … satisfies …” for existential quantification. Let’s use the following table and two simple sample documents to explore how they work:
create table customer(doc XML);
insert into customer values(‘
insert into customer values(‘
Now assume that we want to find customers where all their phone numbers are of type “cell”. We can code a corresponding query in XQuery or in SQL/XML notation as follows:
Both queries return just the first of the two customer documents. The second document is not selected because not all of its phone elements have a @type attribute with the value “cell”.
The quantified expression every $p in $c/phone satisfies $p/@type = “cell” means that $p iterates over all the items in the sequence $c/phone, and for each such $p the predicates $p/@type = “cell” must be true.
If we want to return all documents where at least one of the phone number is of type “cell” (existential quantification!), then we can simply replace the keyword “every” with the keyword “some”, such as in the following query:
This query returns both our sample documents, because both documents contain at least one phone element where the @type attribute has the value “cell”. Since XPath uses existential quantification by default, you can often omit the “some … satisfies …” and obtain the same result with an ordinary path expression, like this:
For more information, use the following resources.
DB2 Information Center – Quantified Expressions:
XQuery specification of quantified expressions:
MatthiasNicola 120000E28R Tags:  sql/xml db2 line newline xmlelement break construction xml 6,773 Views
Generating XML from relational data is not difficult. The SQL standard contains a set of XML construction functions that take column names or other SQL expressions as input and return XML. The most prominent of these XML construction functions are:
Here is a simple example:
create table mydata(c1 int, c2 varchar(25));
select xmlelement(name “myroot”,
<myroot><col1>1</col1><col2>This is a test</col2></myroot>
1 record(s) selected.
The XML is typically returned in textual (serialized) format, as a single line without any line breaks. That’s perfectly fine, except that line breaks can sometimes be useful to make the XML easier to read for the human eye.
There are several ways in which you add add newline characters to the XML to make it more readable.
If the XML that you construct is less then 32KB in size, then you can use the SQL function REPLACE to replace every occurrence of ‘><’ with the hex string 3E0A3C. The hex character 3E is the same as ‘>’, the hex character 3C is the same as ‘<’, and the hex character 0A is a line feed in between them. Before we can apply the REPLACE function to the constructed XML, we need to explicitly convert the XML to data type VARCHAR, which can be done with the function XMLSERIALIZE.
The following query is extended from the previous example by adding the functions XMLSERIALIZE and REPLACE:
1 record(s) selected.
The replace function takes three parameters:
However, this approach works only for smallish XML, up to 32KB.
If you generate larger XML, you have two options. First, you can perform the replace operation at the client that receives the XML. Second, you can explicitly construct XML text nodes that contain newline characters where you want them. For example:
select xmlelement(name “myroot”,
1 record(s) selected.
select xmlelement(name “myroot”,
1 record(s) selected.
If you want a line break after every XML element then this approach can be tedious and makes the SQL statement quite a bit longer. Then again, the line breaks are really only needed for human readability, not for any client application that consumes the generated XML.
If you are or will be using XML in a relational database, chances are that you will be using SQL/XML to query XML and relational data in an integrated manner. Most XML applications are not black or white, meaning they do not use XML exclusively without any relationship to any current or existing relational data. More often than not, XML documents somehow relate to some structured data for which the relational data model might be the better choice. This makes the combined management of XML documents and relational data in a single database -often even in a single table!- so appealing.
One of the most popular and most versatile functions in the SQL/XML language standard is the XMLTABLE function. Let's look at a simple example.
First, let's a create a simple database table that contains a mix of relational data and XML:
CREATE TABLE products(id INTEGER, name VARCHAR(20), details XML);
Now assume we want to retrieve the id, name, category, color, and size for some or all of the products. We can issue a SQL/XML query such as the following:
SELECT id, name, T.*
The XMLTABLE function in the "from" clause is a table function that produces zero, one, or multiple rows for each XML document. In this example it produces one row for each XML document in the "product" table. Implicitly there is an inner join between the XMLTABLE function and the "product" table.
Now what happens if there is a row whose XML column is NULL? Let's insert such a row and repeat the previous query:
INSERT INTO products VALUES(3, 'Ice Scraper', NULL);
You may be surprised that the ID and NAME of the newly inserted product do not show up in this query result. A corresponding row is missing from the result set, because for this row there is no XML document on the XML-side of the inner join.
What we need in this case is a left outer join, so that the relational columns from the base table are returned even if the XML column (the other side of the implicit join) is NULL:
SELECT id, name, T.*
The join condition of this left outer join is trivial: 1=1, which always evaluates to true. It means that we want this left outer join to be applied to every row in this table.
The XMLTABLE function is part of the SQL standard. DB2 supports it, Oracle does too, but as far as I know the XMLTABLE function is not available in SQL Server 2008.
(This was one of the Top 5 most popular posts in 2010 on the nativexmldatabase.com
Relational databases have supported SQL stored procedures for a log time. The benefits of SQL procedures are well-known and include:
With the introduction of XML support in relational databases, several database products (such as DB2) allow you to access and manipulate not only relational data but also XML in stored procedures. This is made possible through:
(a) the introduction of the XML data type in the SQL type system, and
(b) several XML functions in the SQL language such as XMLQUERY, XMLEXISTS, and XMLTABLE that can contain XPath or XQuery expressions.
The picture below provides an example of some of the basic capabilities for XML handling stored procedures:
The examples above are certainly not an exhaustive list of what you can do with XML in stored procedure. For example, while the stored procedure above only operates on a single document that is passed in via the XML parameter, stored procedures can also use XQuery or SQL/XML to access sets of XML documents that are stored in XML columns.
Stored procedures can also compose, prepare, and execute XQuery or SQL/XML statements dynamically. You can define cursors over the result sequence of an XQuery and process each returned item one by one in a loop. You can also construct, split, shred, modify, merge, compare, or validate XML documents in stored procedures.
Of course, all the error and exception handling that is available for stored procedures can also be used with any XML operations. In summary, you can code sophisticated XML processing logic (or hybrid XML/relational processing logic) with the familiar concepts of SQL stored procesures.
Many more examples of XML processing in DB2 Stored Procedures can be found in Chapter 18 of the DB2 pureXML Cookbook.
MatthiasNicola 120000E28R Tags:  web http service purexml rest soap xml udf db2 https 5,440 Views
Undeniably, XML is the message format of choice for many service-oriented architectures and application integration efforts. Also, many SOA and web service implementaions use REST as the protocol for accessing URL-addressable resources and services. REST stands for Representational State Transfer and is built on top of HTTP, which acts as the underlying transport layer.
With a new set user-defined functions (UDFs) in DB2, it has become very simple to issue REST requests directly from your SQL statements or DB2 stored procedures. This enables DB2 to easily interact with REST-based services and integrate information from the web or URL-based resources into the database.
The new REST UDFs allow you to receive and provide information in binary format (BLOB) or textual format (CLOB), which includes but is not limited to XML. The basic REST UDFs are scalar functions that perform the simple HTTP operations GET, POST, PUT, and DELETE. These UDFs are:
Each of these functions take a URL and, optionally, an HTTP header as input paramaters. Additionally, the POST and PUT functions have a third parameter for the BLOB or CLOB data that you want to upload.
The REST UDFs are implemented in Java and support HTTP as well as HTTPS with SSL encryption.
When you use these UDFs in SQL statements to receive information, you can:
In particular, if the information is in XML format you can apply any DB2 pureXML functions to it, such as:
If you are interested and want to try this for yourself, read the following article:
This article describes the REST functions for DB2 in more detail and presents several concrete usage examples. The UDFs themselves are available for download at the bottom of this article.
Enjoy your REST !
MatthiasNicola 120000E28R Tags:  xml sql pathtable document recursive paths recursion xquery 4,887 Views
A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.
An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.
The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.
There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.
First, let’s create a simple table with a small document that we can use in the examples:
create table mytable(xmldoc XML);
A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:
The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That’s maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.
You probably notice that the path Message/Person/id should actually be Message/Person/@id because “id” is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the
If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let’s try SQL/XML for a change.
You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called “common table expression”, that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:
WITH pathstable (name, node, xpath) AS (
If you want to list the element and attribute values for each path, then you can easily modify this query as follows:
WITH pathstable (name, node, xpath, value) AS (
A few things to note:
MatthiasNicola 120000E28R Tags:  xml normalization schema normal forms 3nf relational 2 Comments 4,882 Views
Normalization is a design methodology for relational database schemas and aims to minimize data redundancy and avoid data anomalies, such as update anomalies. The consequence of normalization is that business records (such as a purchase order, an insurance claim, a financial transaction, etc.) are split into pieces that are scattered over potentially many relational tables.
In addition to its benefits, normalization also introduces several drawbacks:
These issues raise the question whether normalization should be applied as categorically as some people believe. Indeed, there are several reasons for reconsidering normalization, such as:
Today, business records are often created and exchanged in a digital format, and this format is often XML. XML is a non-normalized data format that can provide several benefits:
When business records already exist in XML format outside the database anyway, then it is usually best to also store them as XML and not to convert into a normalized relational schema.
My colleague Susan Malaika and I have collected our thoughts and observations on normalization in a 2-part article titled “Data Normalization Reconsidered“. It has recently been published on developerWorks and can be found here:
Part 2: http://www.ibm.com/developerworks/data/library/techarticle/dm-1201normalizationpart2/index.html
Since XML has become so pervasive in enterprise computing and service-oriented architectures, it is a given that XML capabilities are required in all parts of an IT environment. Closely related to the area of databases is ETL - to extract, transform, and load information, e.g. to populate data warehouses or to integrate and connect disparate systems.
The good news is that version 8.5 of IBM InfoSphere DataStage has greatly enhanced XML capabilities. I first saw the new XML features in DataStage 8.5 in demo at the Information On Demand conference
last October. And I was very impressed because the XML support goes far
beyond the half-hearted XML handling that many tools offer.
example, it's easy to import and work with XML Schemas in DataStage
8.5. Many industry standard XML Schemas that are used in the financial
sector, health care, insurance, government, retail, etc. are quite
complex and consist of dozens or even hundreds of XSD files that
comprise a single XML Schema. Examples include FpML, FIXML, HL7, IRS1120, OAGIS, and many others.
You might receive such a schema as a ZIP file that contains multiple folders with XSD files. DataStage 8.5 can simply read the entire ZIP file, which saves you the tedious job of importing all the XSD files separately or dealing with their import and include relationships.
the XML Schema is imported, DataStage understands the structure of your
XML document and allows you define the transformations that you need. The XML transformation capabilities certainly include some of the intuitive things. For example, you can... (read more)
If you think that mainframe computers are old dinosaurs that only run ancient COBOL code – think again! Now mainframes also run XQuery!
I have put the word “only” in quotes because for many applications XPath and SQL/XML are fully sufficient. When you combine XPath expressions with SQL/XML functions such as XMLTABLE plus other SQL language constructs you can write very powerful XML queries and accomplish many of the same things that you can do with XQuery.
DB2 10 for z/OS has added a variety of new XML features such as node-level XML updates, XML-type parameters and variables in stored procedures and user-defined functions, and enhancements for XML Schemas and XML indexes.
With APARs PM47617 and PM47618, DB2 for z/OS now also supports XQuery within the SQL functions XMLTABLE, XMLQUERY, XMLEXISTS, and XMLMODIFY.
So what are the additional capabilities and benefits that XQuery provides? Examples include:
To see sample queries for each of the above, continue reading here: