On this last day of year 2008, I'd like to share with you on how easy it is to generate XML from relational data using SQL in DB2 V8 and DB2 9 for z/OS.
To start with, I'd like to show a simple example of SQL functions that generates the following XML:
<name> <firstname>John</firstname> <lastname>Doe</lastname></name>
The SQL functions to construct this XML is the following:
XMLELEMENT(NAME "name", XMLELEMENT(NAME "firstname", 'John'), XMLELEMENT(NAME "lastname", 'Doe') )
As you can see, the SQL functions nest exactly corresponding to the XML data nesting structure. These functions are standard SQL XML constructors or sometimes called XML publishing functions. Notice that the "NAME" keyword is required by the standard to indicate that an SQL ID ("name") is used as a name for the element, as opposed to using the value of a column ("name") in other SQL functions. Double quotes are necessary to keep the name case-sensitive.
In the next example, I'd like to show how you can group a set of elements into one XML value and nest under an element using XMLAGG. This query assumes you have an employee table:
+---------+--------------+------------+--------+ | dept | firstname | lastname | ... | +---------+--------------+------------+--------+ Shipping White Smith ... Shipping Paul Taylor ... Shipping John Doe ... ...
And you want to group employees under a department element like the following:
<Department name="Shipping"> <Emp> <firstname>John</firstname> <lastname>Doe</lastname> </Emp> <Emp> <firstname>White</firstname> <lastname>Smith</lastname> </Emp> <Emp> <firstname>Paul</firstname> <lastname>Taylor</lastname> </Emp></Department>...
The query to generate this XML data is as follows:
SELECT XML2CLOB( XMLELEMENT(NAME "Department", XMLATTRIBUTES (e.dept AS "name" ), XMLAGG(XMLELEMENT(NAME "Emp", XMLELEMENT(NAME "firstname", e.firstname), XMLELEMENT(NAME "lastname", e.lastname) ) ORDER BY e.lastname ) ) ) AS "dept_list"FROM employees e GROUP BY dept;
This is what people typically call "de-normalization" (of relational data). Notice that XMLAGG takes ORDER BY clause to specify the order of elements within a group. In this example, it's to order the elements based on the lastname. This query should work on both V8 and V9. In DB2 9, additional constructors are provided to make them complete. V8 queries continue to work in DB2 9.
There are many advantages of using SQL to generate XML from relational data. Here are a few of them:
- It is easy to extend existing COBOL or PL/I applications to generate XML by using SQL. It works for non-Unicode data.
- It is much easier to generate XML than using DOM in programming languages like Java and C/C++, enabling higher productivity.
- It makes applications light-weight, with high-performance. It delivers the XML value in one chunk, instead of piece-by-piece.
What is a QName?
Before listing some additional resources, I'd like to trickle in some terminology related to XML for beginners (please skip if you already know). An element or attribute has a name, officially called a QName, for Qualified Name, such as "hr:name", where "hr" is a prefix and "name" is a local name. Both prefix and local name are NCNames, for No-Colon Names (names without a colon in them). The prefix is for the namespace, and needs to be declared with a namespace declaration like the following:
<hr:name xmlns:hr="http://www.example.com/humanresource"> ... </hr:name>
Where "xmlns" signifies a namespace declaration, and "hr" is the namespace prefix being declared. The value "http://www.example.com/humanresource" is a namespace name or namespace URI. The reasons that XML namespace is designed this way are the following:
- To avoid conflict (the purpose of a namespace), it has to be some unique identifiers across the Web. That will be the URI (Uniform Resource Identifier).
- A URI cannot be used directly in an element name or attribute name because it does not follow the syntax rules for element/attribute names in general.
- A prefix is used instead to make it follow the name syntax rules, and it is shorter usually.
An element or attribute is said to be in the scope of a namespace if it is between the start tag (inclusive) containing the namespace declaration and the corresponding end tag. And an element has a set of in-scope namespaces. A namespace declaration is said to be superfluous if the namespace being declared is already in the scope.
An NCName becomes a QName if the prefix is empty, that is, either there is no namespace (No Namespace is used), or a default namespace is used (a namespace is bound to the empty prefix). A colon is not used in these cases. The following example illustrates a default namespace declaration:
<name xmlns="http://www.example.com/humanresource"> ... </name>
For more SQL/XML constructor details and examples on generating XML from relational data, please look at this "old" presentation of mine from IM Tech Conference '05: Generating XML from Relational Data Using SQL with Ease. Some of the examples are included in the XMLQuickDemo 7XMLGEN1.CLP and 8XMLGEN2.CLP. I've also made XML4TABLE package public. This package contains two SQL PL stored procedures for DB2 9 for z/OS: XML4TABLE and XML4ENTITY1, which can generate XML for an entire table or a given entity from multiple RI-related tables. Read the readme.txt file in the package for details. Let me know if you have any feedback.
Also, check IBM Data Studio and Rational Data Architect for tool support in generating SQL/XML queries to generate XML from relational data. And don't forget to check SQL Reference and XML Guide, and other online resources.
I wish you a very happy and prosperous new year 2009!
-Guogen (Gene) Zhang