The XML Database Blog
MatthiasNicola 120000E28R Etiquetas:  database relational xml_database xml performance 12.873 vistas
I have been asked many times: “What is faster, XML or Relational?”. Of course, this question oversimplifies a complex issue, and so the only valid answer is “It depends!”. Sometimes people ask the same question in a slightly different way: “If I have a relational table and convert each row into a small XML document and store these documents in a separate table with 1 XML column, what’s the performance difference between the two tables (for inserts/updates/queries)?”. But, in most cases such a conversion is not recommended and this type of comparison is, again, too simplistic.
Let’s say you want store, index, and query 1 million addresses and each address has a first name, last name, street, city, state, and zip code. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases have been optimized for decades to handle such fixed records very efficiently. However, if the application needs to convert the address data to XML format anyway, it can often be faster to store the data permanently in XML and avoid the repeated conversion from relational format to XML format in the application.
Now consider a scenario where the business objects of interest are a lot more complex and variable than simple addresses. For example, derivative trades in the financial industry are modeled in XML with an XML Schema called FpML (financial products markup language). It defines more than 7000 fields (many are optional) with hundreds of 1-to-many relationships between them. Designing a relational schema to represent such objects is very hard and leads to hundreds of tables. The process of inserting (normalizing) and reading (denormalizing) a single object into such a relational schema can easily be 10x or 100x slower than inserting and reading a corresponding XML document in a native XML column (e.g. in DB2 pureXML).
So, any performance comparison of XML versus relational depends heavily on which data you choose for the comparison, and what type of operations you measure.
A large DB2 customer recently compared XML to relational performance because their business objects are currently mapped to 12 relational tables. Their application executes at least 12 SQL statements to retrieve all the relational data that comprises one of the logical business objects. Then the application reassembles the original business object. An alternative is to store these business objects as XML so that each object is stored as a single document. Instead of 12 tables, only 1 table with 1 XML column is then needed. In multi-user tests for data retrieval the company found that the XML-based solution allows them to retrieve objects with 55% higher throughput than the existing relational SQL-based solution. The reasons for the performance benefit include fewer JDBC interactions between application and database as well as fewer distinct pages that need to be read when one logical business object is represented as one XML document - and not scattered over 12 tables! These tests were later repeated and verified at an IBM lab.
Another important consideration in the question of XML versus relational is the data format in which the data is produced and consumed outside the database. If the data is produced and/or consumed in XML format anyway, it is often better to also store the data as XML in the database.
So, the question “What is faster, XML or Relational?” is somewhat like asking “What is faster, a truck or a ship?”, because XML and relational are meant for different purposes, and either one can outperform the other depending on which use case you look at. And there are also use cases (with high schema complexity and schema variability over time) that cannot reasonably be implemented in a relational data model. (If you need to go to Hawaii, the boat always beats the truck!).
The beauty of a hybrid database system such as DB2 is that you can use both native XML and relational capabilitues side by side, in a tightly integrated manner. Some data is better represented in relational tables, other data is better represented in XML, and you can manage both in the same database or even the same table, and with the same APIs and utilities.
(This was one of the Top 5 most popular posts in 2010 on the nativexmldatabase.com blog.)
MatthiasNicola 120000E28R Etiquetas:  expressions sql satisfies xquery quantified every db2 12.419 vistas
If you are familiar with SQL you have probably seen quantified SQL predicates that compare a single value with a set of values. For example, the following SQL query returns all rows from table1 where col1 is greater than all values in col2 of table2.
And if you replace the keyword ALL with the keyword SOME, then the query above returns all rows from table1 where col1 is greater than at least one of the values in col2 of table2.
These comparisons are also known as universal quantification (“ALL”) and existential quantification (“SOME”), respectively.
Similar quantified comparisons are possible in XQuery too, but the syntax is slightly different. In XQuery you have the keywords “every … satisfies ….” for universal quantification and “some … satisfies …” for existential quantification. Let’s use the following table and two simple sample documents to explore how they work:
create table customer(doc XML);
insert into customer values(‘
insert into customer values(‘
Now assume that we want to find customers where all their phone numbers are of type “cell”. We can code a corresponding query in XQuery or in SQL/XML notation as follows:
Both queries return just the first of the two customer documents. The second document is not selected because not all of its phone elements have a @type attribute with the value “cell”.
The quantified expression every $p in $c/phone satisfies $p/@type = “cell” means that $p iterates over all the items in the sequence $c/phone, and for each such $p the predicates $p/@type = “cell” must be true.
If we want to return all documents where at least one of the phone number is of type “cell” (existential quantification!), then we can simply replace the keyword “every” with the keyword “some”, such as in the following query:
This query returns both our sample documents, because both documents contain at least one phone element where the @type attribute has the value “cell”. Since XPath uses existential quantification by default, you can often omit the “some … satisfies …” and obtain the same result with an ordinary path expression, like this:
For more information, use the following resources.
DB2 Information Center – Quantified Expressions:
XQuery specification of quantified expressions:
MatthiasNicola 120000E28R Etiquetas:  sql/xml db2 line newline xmlelement break construction xml 8.154 vistas
Generating XML from relational data is not difficult. The SQL standard contains a set of XML construction functions that take column names or other SQL expressions as input and return XML. The most prominent of these XML construction functions are:
Here is a simple example:
create table mydata(c1 int, c2 varchar(25));
select xmlelement(name “myroot”,
<myroot><col1>1</col1><col2>This is a test</col2></myroot>
1 record(s) selected.
The XML is typically returned in textual (serialized) format, as a single line without any line breaks. That’s perfectly fine, except that line breaks can sometimes be useful to make the XML easier to read for the human eye.
There are several ways in which you add add newline characters to the XML to make it more readable.
If the XML that you construct is less then 32KB in size, then you can use the SQL function REPLACE to replace every occurrence of ‘><’ with the hex string 3E0A3C. The hex character 3E is the same as ‘>’, the hex character 3C is the same as ‘<’, and the hex character 0A is a line feed in between them. Before we can apply the REPLACE function to the constructed XML, we need to explicitly convert the XML to data type VARCHAR, which can be done with the function XMLSERIALIZE.
The following query is extended from the previous example by adding the functions XMLSERIALIZE and REPLACE:
1 record(s) selected.
The replace function takes three parameters:
However, this approach works only for smallish XML, up to 32KB.
If you generate larger XML, you have two options. First, you can perform the replace operation at the client that receives the XML. Second, you can explicitly construct XML text nodes that contain newline characters where you want them. For example:
select xmlelement(name “myroot”,
1 record(s) selected.
select xmlelement(name “myroot”,
1 record(s) selected.
If you want a line break after every XML element then this approach can be tedious and makes the SQL statement quite a bit longer. Then again, the line breaks are really only needed for human readability, not for any client application that consumes the generated XML.
MatthiasNicola 120000E28R Etiquetas:  xml_database xmltable query outer_join xml sql/xml 7.885 vistas
If you are or will be using XML in a relational database, chances are that you will be using SQL/XML to query XML and relational data in an integrated manner. Most XML applications are not black or white, meaning they do not use XML exclusively without any relationship to any current or existing relational data. More often than not, XML documents somehow relate to some structured data for which the relational data model might be the better choice. This makes the combined management of XML documents and relational data in a single database -often even in a single table!- so appealing.
One of the most popular and most versatile functions in the SQL/XML language standard is the XMLTABLE function. Let's look at a simple example.
First, let's a create a simple database table that contains a mix of relational data and XML:
CREATE TABLE products(id INTEGER, name VARCHAR(20), details XML);
Now assume we want to retrieve the id, name, category, color, and size for some or all of the products. We can issue a SQL/XML query such as the following:
SELECT id, name, T.*
The XMLTABLE function in the "from" clause is a table function that produces zero, one, or multiple rows for each XML document. In this example it produces one row for each XML document in the "product" table. Implicitly there is an inner join between the XMLTABLE function and the "product" table.
Now what happens if there is a row whose XML column is NULL? Let's insert such a row and repeat the previous query:
INSERT INTO products VALUES(3, 'Ice Scraper', NULL);
You may be surprised that the ID and NAME of the newly inserted product do not show up in this query result. A corresponding row is missing from the result set, because for this row there is no XML document on the XML-side of the inner join.
What we need in this case is a left outer join, so that the relational columns from the base table are returned even if the XML column (the other side of the implicit join) is NULL:
SELECT id, name, T.*
The join condition of this left outer join is trivial: 1=1, which always evaluates to true. It means that we want this left outer join to be applied to every row in this table.
The XMLTABLE function is part of the SQL standard. DB2 supports it, Oracle does too, but as far as I know the XMLTABLE function is not available in SQL Server 2008.
(This was one of the Top 5 most popular posts in 2010 on the nativexmldatabase.com
MatthiasNicola 120000E28R Etiquetas:  xml sql pathtable document recursive paths recursion xquery 6.424 vistas
A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.
An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.
The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.
There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.
First, let’s create a simple table with a small document that we can use in the examples:
create table mytable(xmldoc XML);
A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:
The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That’s maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.
You probably notice that the path Message/Person/id should actually be Message/Person/@id because “id” is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the
If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let’s try SQL/XML for a change.
You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called “common table expression”, that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:
WITH pathstable (name, node, xpath) AS (
If you want to list the element and attribute values for each path, then you can easily modify this query as follows:
WITH pathstable (name, node, xpath, value) AS (
A few things to note:
Relational databases have supported SQL stored procedures for a log time. The benefits of SQL procedures are well-known and include:
With the introduction of XML support in relational databases, several database products (such as DB2) allow you to access and manipulate not only relational data but also XML in stored procedures. This is made possible through:
(a) the introduction of the XML data type in the SQL type system, and
(b) several XML functions in the SQL language such as XMLQUERY, XMLEXISTS, and XMLTABLE that can contain XPath or XQuery expressions.
The picture below provides an example of some of the basic capabilities for XML handling stored procedures:
The examples above are certainly not an exhaustive list of what you can do with XML in stored procedure. For example, while the stored procedure above only operates on a single document that is passed in via the XML parameter, stored procedures can also use XQuery or SQL/XML to access sets of XML documents that are stored in XML columns.
Stored procedures can also compose, prepare, and execute XQuery or SQL/XML statements dynamically. You can define cursors over the result sequence of an XQuery and process each returned item one by one in a loop. You can also construct, split, shred, modify, merge, compare, or validate XML documents in stored procedures.
Of course, all the error and exception handling that is available for stored procedures can also be used with any XML operations. In summary, you can code sophisticated XML processing logic (or hybrid XML/relational processing logic) with the familiar concepts of SQL stored procesures.
Many more examples of XML processing in DB2 Stored Procedures can be found in Chapter 18 of the DB2 pureXML Cookbook.
MatthiasNicola 120000E28R Etiquetas:  http web service purexml rest soap xml udf db2 https 5.994 vistas
Undeniably, XML is the message format of choice for many service-oriented architectures and application integration efforts. Also, many SOA and web service implementaions use REST as the protocol for accessing URL-addressable resources and services. REST stands for Representational State Transfer and is built on top of HTTP, which acts as the underlying transport layer.
With a new set user-defined functions (UDFs) in DB2, it has become very simple to issue REST requests directly from your SQL statements or DB2 stored procedures. This enables DB2 to easily interact with REST-based services and integrate information from the web or URL-based resources into the database.
The new REST UDFs allow you to receive and provide information in binary format (BLOB) or textual format (CLOB), which includes but is not limited to XML. The basic REST UDFs are scalar functions that perform the simple HTTP operations GET, POST, PUT, and DELETE. These UDFs are:
Each of these functions take a URL and, optionally, an HTTP header as input paramaters. Additionally, the POST and PUT functions have a third parameter for the BLOB or CLOB data that you want to upload.
The REST UDFs are implemented in Java and support HTTP as well as HTTPS with SSL encryption.
When you use these UDFs in SQL statements to receive information, you can:
In particular, if the information is in XML format you can apply any DB2 pureXML functions to it, such as:
If you are interested and want to try this for yourself, read the following article:
This article describes the REST functions for DB2 in more detail and presents several concrete usage examples. The UDFs themselves are available for download at the bottom of this article.
Enjoy your REST !
MatthiasNicola 120000E28R Etiquetas:  xml normalization schema normal forms 3nf relational 2 Comentarios 5.304 vistas
Normalization is a design methodology for relational database schemas and aims to minimize data redundancy and avoid data anomalies, such as update anomalies. The consequence of normalization is that business records (such as a purchase order, an insurance claim, a financial transaction, etc.) are split into pieces that are scattered over potentially many relational tables.
In addition to its benefits, normalization also introduces several drawbacks:
These issues raise the question whether normalization should be applied as categorically as some people believe. Indeed, there are several reasons for reconsidering normalization, such as:
Today, business records are often created and exchanged in a digital format, and this format is often XML. XML is a non-normalized data format that can provide several benefits:
When business records already exist in XML format outside the database anyway, then it is usually best to also store them as XML and not to convert into a normalized relational schema.
My colleague Susan Malaika and I have collected our thoughts and observations on normalization in a 2-part article titled “Data Normalization Reconsidered“. It has recently been published on developerWorks and can be found here:
Part 2: http://www.ibm.com/developerworks/data/library/techarticle/dm-1201normalizationpart2/index.html
MatthiasNicola 120000E28R Etiquetas:  db2 sql/xml 10 mainframe for xquery xpath z/os 5.262 vistas
If you think that mainframe computers are old dinosaurs that only run ancient COBOL code – think again! Now mainframes also run XQuery!
I have put the word “only” in quotes because for many applications XPath and SQL/XML are fully sufficient. When you combine XPath expressions with SQL/XML functions such as XMLTABLE plus other SQL language constructs you can write very powerful XML queries and accomplish many of the same things that you can do with XQuery.
DB2 10 for z/OS has added a variety of new XML features such as node-level XML updates, XML-type parameters and variables in stored procedures and user-defined functions, and enhancements for XML Schemas and XML indexes.
With APARs PM47617 and PM47618, DB2 for z/OS now also supports XQuery within the SQL functions XMLTABLE, XMLQUERY, XMLEXISTS, and XMLMODIFY.
So what are the additional capabilities and benefits that XQuery provides? Examples include:
To see sample queries for each of the above, continue reading here:
The XMLTABLE function is part of the SQL standard. It can take XML as input, use XPath or XQuery expressions to extract values or pieces from the XML, and return the result in a relational row and column format.
As discussed in a previous blog post, the XMLTABLE function is often used in queries to read XML documents from an XML column in the database and return the extracted values as a relational result set.
But, the XMLTABLE function can also be used in INSERT and UPDATE statements. This allows an application to pass an XML document as a parameter to an INSERT or UPDATE statement that extractx selected values and uses them to insert or update relational rows in a table.
Using the XMLTABLE function in INSERT statements is quite common if requirements dictate that XML needs to be shredded to relational tables rather than stored natively in an XML column.
Here is a simple example that extracts values from a small XML document and inserts these values into the table “reltable”:
The PASSING clause assigns the input document to the variable “$doc” which is then the starting point for the extraction.
CREATE TABLE reltable(id int, col2 int, col3 varchar(20)); INSERT INTO reltable SELECT id, x, y FROM XMLTABLE ('$doc/mydata' PASSING xmlparse(document '<mydata id="1"> <elem1>555</elem1> <elem2>test</elem2> </mydata>') as "doc" COLUMNS id INTEGER PATH '@id', x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' );
You don’t necessarily need to hardcode the XML input document in the INSERT statement. It can also be supplied via a parameter marker or host variable, as in this example:
INSERT INTO reltable SELECT id, x, y FROM XMLTABLE ('$doc/mydata' PASSING cast(? as XML) as "doc" COLUMNS id INTEGER PATH '@id', x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' );
Ok, this was the easy part.
Now what if we want to use the XMLTABLE function in an UPDATE statement to replace the values in an existing row in “reltable” with new values from an incoming XML document? It might not be quite as obvious how to write such an UPDATE statement, but as it turns out it’s not very hard either!
Here is an UPDATE statement that extracts the values of the XML elements “elem1″ and “elem2″ to update the columns “col2″ and “col3″ in “reltable” for the row that has id = 1.
UPDATE reltable SET (col2, col3) = (SELECT x, y FROM XMLTABLE ('$doc/mydata' PASSING xmlparse(document '<mydata id="1"> <elem1>777</elem1> <elem2>abcd</elem2> </mydata>') as "doc" COLUMNS x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' ) ) WHERE id = 1;
The UPDATE statement above uses a simple WHERE clause to select a specific row to be updated. But, what if we don’t know which target row the incoming XML document needs to be applied to? Ideally, we want to extract the @id attribute from the input document and update whichever row matches its value.
We might be tempted to simply extract the @id attribute in the same XMLTABLE function and add a join predicate to the subselect, like this:
-- this statement is not a good idea! UPDATE reltable r SET (r.col2, r.col3) = (SELECT T.x, T.y FROM XMLTABLE ('$doc/mydata' PASSING cast(? as XML) as "doc" COLUMNS id INTEGER PATH '@id', x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' ) T WHERE r.id = T.id );
But, this statement would update all rows in the table (many of them with NULL values) because the WHERE clause only applies to the rows produced by the subselect, not to the rows in “reltable”.
One possible solution is to add a WHERE clause that extracts the @id attribute as needed to filter the rows in “reltable”:
-- this statement is better, but not optimal UPDATE reltable SET (col2, col3) = (SELECT x, y FROM XMLTABLE ('$doc/mydata' PASSING cast(? as XML) as "doc" COLUMNS x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' ) ) WHERE id = XMLCAST( XMLQUERY('$doc/mydata/@id' PASSING cast(? as XML) as "doc") AS INTEGER);
Yes, this works, but it requires us to pass the XML document into the statement twice: once into the subselect in the SET clause, and once into the XMLQUERY function in the WHERE clause. That’s not very elegant and probably not ideal for performance either.
There might be multiple ways to improve the UPDATE statement above. One nice solution is to use a MERGE statement:
MERGE INTO reltable r USING (SELECT id, x, y FROM XMLTABLE ('$doc/mydata' PASSING cast(? as XML) as "doc" COLUMNS id INTEGER PATH '@id', x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' )) t ON r.id = t.id WHEN MATCHED THEN UPDATE SET r.col2 = t.x, r.col3 = t.y;
The nice thing about the MERGE statement is that it can also handle multiple cases where the XMLTABLE function produces multiple rows whose keys may or may not already exist in the target table, so that UPDATE and/or INSERT operations need to be performed. Here is an example:
MERGE INTO reltable r USING (SELECT id, x, y FROM XMLTABLE ('$doc/root/mydata' PASSING xmlparse(document '<root> <mydata id="1"> <elem1>999</elem1> <elem2>xyz</elem2> </mydata> <mydata id="3"> <elem1>333</elem1> <elem2>test33</elem2> </mydata> </root>') as "doc" COLUMNS id INTEGER PATH '@id', x INTEGER PATH 'elem1', y VARCHAR(20) PATH 'elem2' )) t ON r.id = t.id WHEN MATCHED THEN UPDATE SET r.col2 = t.x, r.col3 = t.y WHEN NOT MATCHED THEN INSERT VALUES(t.id, t.x, t.y); SELECT * FROM reltable; ID COL2 COL3 ----------- ----------- -------------------- 1 999 xyz 3 333 test33 2 record(s) selected.
This blog post has provided some basic examples that might serve as a starting point to develop your own INSERT, UPDATE, and MERGE statements with the XMLTABLE function.
MatthiasNicola 120000E28R Etiquetas:  join xmltable flwor xml db2 xquery relational sql/xml. purexml 5.066 vistas
If you look at existing documentation and examples of the XMLTABLE function, you find that most examples use simple XPath expressions inside the XMLTABLE function and rarely more complex XQuery expressions such as FLWOR expressions. This is also true for my 2 articles on XMLTABLE:
XMLTABLE By Example, Part1 and Part2:
Why is that? Is it because XQuery expressions other than plain path expressions are never required in the XMLTABLE function? Well, let’s take a look at that question…
First, I do believe that many typical SQL/XML queries can indeed be coded with just regular XPath in the XMLTABLE function. I have seen that this is true in many real application deployments.
But, sometimes it can certainly be useful to embed more complex XQuery expressions than just XPath in an XMLTABLE function. For example, in my blog post “How to get a list of all elements and attributes” you saw that I used the XQuery comma operator to construct a sequence from two expressions. In that case the comma expression was //(*,@*) to get all elements and all attributes. In the same post you also saw the use of the XQuery if-then-else expression in the column definition of the XMLTABLE function.
Using FLWOR expression in the XMLTABLE function can be useful if you join and combine XML from two different XML columns. For example let’s consider the following two tables with te subsequent sample documents:
CREATE TABLE order(orderdetails XML);
The order table contains one XML document per order, and the product table one XML document per product. Each product has a @pid attribute as a unique identifier, and orders have /order/item/pid elements to specify which products have been ordered. Naturally, this allows us to perform a join between orders and products.
For example, the following query is a join to retrieve the product information for all the products that have been ordered in order 123:
This query returns one row (one XML document) for each product that is referenced in order 123.
Now, what if you want to combine data from an order document and the associated product documents, and return this information in one relational row per product? For example, assume you want rows that show the product ID and product weight (from the product documents) as well as the price and quantity of that product in the order document.
In that case you could use an XQuery FLWOR expression to perform the join and combine the product and order information in the XMLTABLE function, as shown in the following query.
Since the product and order tables appear in the FROM clause of the query, the FLWOR expression references their XML columns through the variables $DESCRIPTION and $ORDERDETAILS. The return clause of the FLWOR expression constructs XML fragments that combine the desired elements and attributes from each matching pair of product and order documents. The constructed <result> elements are then input to the COLUMNS clause where they are broken up into relational columns.
It is worthwhile noting that you can produce the same result set without the use of FLWOR expressions, as shown in the next query. This query uses two XMLTABLE function, one for the desired order information and one for the desired product information. The key trick is that the two XMLTABLE function are joined on the product ID using the predicate [@pid = $p]. Without this predicate the two XMLTABLE functions would produce a Cartesian product, which is not desired.
SELECT T2.prodid, T2.weight, T1.qty, T1.price
The join predicate between the two XMLTABLE functions can also be placed in the SQL WHERE clause:
SELECT T2.prodid, T2.weight, T1.qty, T1.priceHowever, this query applies the join predicate after the XMLTABLE functions have produced their rows, which can be slower than the previous query where the predicate was in the XMLTABLE function itself.
You can find many more examples of the XMLTABLE function, XML joins, and joins between XML and relational columns in Chapter 9 of the DB2 pureXML Cookbook.
Since XML has become so pervasive in enterprise computing and service-oriented architectures, it is a given that XML capabilities are required in all parts of an IT environment. Closely related to the area of databases is ETL - to extract, transform, and load information, e.g. to populate data warehouses or to integrate and connect disparate systems.
The good news is that version 8.5 of IBM InfoSphere DataStage has greatly enhanced XML capabilities. I first saw the new XML features in DataStage 8.5 in demo at the Information On Demand conference
last October. And I was very impressed because the XML support goes far
beyond the half-hearted XML handling that many tools offer.
example, it's easy to import and work with XML Schemas in DataStage
8.5. Many industry standard XML Schemas that are used in the financial
sector, health care, insurance, government, retail, etc. are quite
complex and consist of dozens or even hundreds of XSD files that
comprise a single XML Schema. Examples include FpML, FIXML, HL7, IRS1120, OAGIS, and many others.
You might receive such a schema as a ZIP file that contains multiple folders with XSD files. DataStage 8.5 can simply read the entire ZIP file, which saves you the tedious job of importing all the XSD files separately or dealing with their import and include relationships.
the XML Schema is imported, DataStage understands the structure of your
XML document and allows you define the transformations that you need. The XML transformation capabilities certainly include some of the intuitive things. For example, you can... (read more)
The SQL language standard includes XML as one of the SQL data types,
which allows you to have columns of type XML in tables and result sets.
The latest JDBC standard (JDBC 4.0) now includes an interface and data
type called SQLXML that correspond to the XML data type in the database
Of course you can use JDBC 3.0 to send and retrieve XML data to/from
a database as a String, Clob, BinaryStream, CharacterStream, etc. with
the traditional getter and setter methods. In addition to that, the new
SQLXML interface in JDBC 4.0 enables you to use a DOM or SAX
representation of your XML data when you send or receive it to/from the
database. This can be very helpful if your application already uses the
DOM or SAX API to manipulate XML.
The SQLXML interface in JDBC 4.0 works for both SQL/XML and XQuery. So even if you use XQuery without any SQL involved, you can use JDBC to conveniently retrieve and process the result sequence that is produced by an XQuery. And if you use SQL/XML to insert, update, or retrieve XML and relational data at the same time, then JDBC is also very well suited for your needs.
Let’s look at an example. First, be aware that JDBC 4.0 requires Java 6. To use JDBC 4.0 with DB2 you also need JCC4 (Java Common Client 4, db2jcc4.jar). Assume you have a target table such as the following:
create table mycustomer(info XML)
String sql = "INSERT INTO mycustomer(info) VALUES (?)";
PreparedStatement stmt = connection.prepareStatement(sql);
String sql = "INSERT INTO mycustomer(info) VALUES (?)";
PreparedStatement stmt = connection.prepareStatement(sql);
SQLXML sqlxml = con.createSQLXML();
// issue an XQuery or SQL query via JDBC as you normally would:
ResultSet resultSet = statement.executeQuery("SELECT info FROM mycustomer WHERE ....");
// retrieve an SQLXML object from the ResultSet:
SQLXML sqlxml = resultSet.getSQLXML(1); // 1 is the column index
// obtain DOM tree from SQLXML object:
DOMSource source = sqlxml.getSource(DOMSource.class);
// create document object from DOMSource:
Document document = (Document) source.getNode();
// now you can process the DOM tree as usual...
You can find more information and examples here:
What is an XML Schema? Some of you may already know this, others don’t. So before I’m going to share some more technical information about XML Schemas in subsequent blog posts, I better get some of the basics out of the way first.
When you process and manage information in XML format, you can choose to use an XML Schema with your XML documents. Roughly speaking, an XML Schema can be used to define what you want your XML documents to look like. For example, in an XML Schema you can define:
If you choose to create an XML Schema, it may define just some or all of the aspects listed above. The designer of the XML Schema can choose the degree to which the schema constraints the characteristics of the XML documents. For example, an XML Schema can be very loose and define only a few key features for your XML documents and allow for a lot of flexibility. Or it can be very strict to tightly control the XML data in every aspect. Or anything in between.
The use of an XML Schema is optional, i.e. an XML Schema is not required to store, index, query, or update XML documents. However, an XML Schema can be very useful to ensure that the XML documents that you receive or produce are compliant with certain structural rules that allow applications to process the XML. In other words, XML Schemas help you to enforce data quality.
If an document complies with a given XML Schema, then the document is said to be valid for this schema. A document might be valid for one schema but invalid for another schema. The process of testing an XML document for compliance with an XML Schema is called validation.
When an XML document is parsed by an XML parser, validation can be enabled as an optional part of the parsing process. Full validation of an XML document always requires XML parsing. For many documents and schemas, validation typically incurs only a small delta cost (in terms of CPU usage) on top of the cost of XML parsing.
What does an an XML Schema look like?
An XML Schema itself is an XML document! But, a very special document that needs to comply with very specific rules that are defined by -you guessed it!- another XML Schema, i.e. the schema for schemas.
Large XML schemas can consist of multiple schema documents that reference each other through import and include relationships. This allows you to compose an XML Schema out of smaller building blocks in a modular fashion.
I don’t want to go into the syntax details of XML Schemas here, but there are some useful resources available:
When and why should I use an XML Schema?
Simply put, if you want to ensure data quality and detect XML documents that do not comply with an expected format, use an XML Schema and validate each document!
However, what if XML documents pass through multiple components of your IT infrastructure, such as a message queue, an application server, an enterprise service bus, and the database system? If these components do not modify the XML but merely read & route it, examine whether all of these components need to validate each document. For example, if the application server has already validated a document before insertion into a DB2 database, does the document need to be validated again in DB2? Maybe not, if you trust the application layer. Maybe yes, if you don’t.
An XML Schema is also often used as a “contract” between two or more parties that exchange XML documents. With this contract the parties agree on a specific format and structure of the XML messages that they send and receive, to ensure seamless operation.
Practically every vertical industry has defined XML Schemas to standardize XML message formats for the data processing in their industry. A good overview is given by the introduction of this article:
“Getting started with Industry Formats and Services with pureXML”: http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/
How can I validate XML documents in DB2?
Simple. First, you register one or multiple XML Schemas in the DB2 database. This can be done with CLP commands, stored procedures, or through API calls in the JDBC or .NET interface to DB2. After a schema is registered in DB2, you can use it to validate XML documents in DB2, typically when you insert, load, or update XML documents. You can enforce a single XML Schema for all XML documents in an XML column, or you can allow multiple XML Schemas per column. A database administrator can force automatic validation upon document insert, or allow applications to choose one of the previously registered schema for validation whenever a document inserted.
And… validation can also be done in SQL statements?
Yup. The SQL standard defines a function called XMLVALIDATE, which can be used for document validation in INSERT statement, UPDATE statements, triggers, stored procedures, and even in queries.
Here is a simple example of an INSERT statement that adds a row to a customer customer table, which consists of an integer ID column and an XML column called “doc”:
INSERT INTO customer(id, doc)
The id and the document are provided by parameter markers “?”, and the XMLVALIDATE function that is wrapped around the second parameter ensures validation against the XML Schema that has been regoistered under the identifier db2admin.custxsd.
If the inserted document is not compliant with the XML Schema, the INSERT statement fails with an appropriate error message. Similarly, the XMLVALIDATE function can also be used in the right-hand side of the SET clause of an UPDATE statement that modifies or replaces an XML document.
Ok, so much for now. In my next blog post we’ll go into more detail.
MatthiasNicola 120000E28R Etiquetas:  xpath xquery_functions db2 xquery purexml sql_server 4.435 vistas
XQuery is a feature-rich language that allows you to code sophisticated queries against XML data. As you probably know, the XPath language is a subset of XQuery. XPath and XQuery share a common set of built-in functions that are defined by the W3C specification titled “XQuery 1.0 and XPath 2.0 Functions and Operators“. This spec describes dozens and dozens of functions that deal with strings, numbers, dates, timestamps, XML nodes and sequences, and so on.
The DB2 support for the XPath and XQuery functions is documented in the DB2 Information Center, and you might find corresponding information for other database systems on their respective web sites. However, other databases might not support as many of the XQuery functions as DB2. For example, DB2 9.7 for Linux, UNIX, and Windows supports 21 XQuery string functions while SQL Server 2008 seems to have just six. DB2 also supports over 30 XQuery date and time functions, SQL Server seems to support none of those.
Admittedly, some functions are more commonly used then others. XQuery functions such as “substring” or “count” are used quite often. In contrast, the XQuery functions “reverse” and “string-to-codepoints” tend to be used much less frequently.
But then there are some functions that you might not use every day, but when you do need them you really don’t want to miss them. For example, sometimes you may have to
These and other tasks are common enough. Yes, you can also use application code to perform such tasks, but that is much less convenient and less efficient than using a built-in function that’s readily available. Being aware of the available built-in functions can make your life easier when you develop XML applications. The functions listed above are available in DB2 pureXML, but these and many others don’t seem to be supported in SQL Server 2008.
(This was one of the Top 5 most popular posts in 2010 on the nativexmldatabase.com blog.)