 | Level: Intermediate Mayank Pradhan (mayankpr@us.ibm.com), Staff Software Engineer, IBM
13 Apr 2006 The annotated XML schema decomposition in IBM® DB2® Viper allows you to decompose XML documents into relational tables more efficiently than the DB2 XML Extender. While DB2 XML Extender also supports the ability to shred documents into relational schema, it only allows for a very limited function and is also constrained by the proprietary mapping format, DAD. Annotated XML schema decomposition, on the other hand, provides an XML schema-based flexible mapping language that provides granular control to the users over the entire process of decomposition. It is also much faster than XML Extender shredding. Learn about the benefits of migration to annotated XML schema decomposition at various levels, such as functionality, usability, and performance. Then follow the steps required for migration from XML Extender shredding to annotated XML schema decomposition. This article also provides a tool (Dad2AS) that can be used to convert the XML Extender RDB_Node DADs to the annotated XML schema.
Introduction to annotated XML schema decomposition
The annotated XML schema decomposition feature introduced in
DB2 Viper can be used to decompose XML documents, wholly or partially, into
relational tables. It uses annotations in XML schema as the mapping language to
map information in an XML document to relational tables. The new XML decomposition
feature uses XML Schema as the platform to obtain mapping information. Since,
XML Schema is an open standard, it only requires the addition of a few proprietary
features in the mapping language specification. This allows for a shorter
learning curve as opposed to an equivalent mapping specification in a completely
proprietary language. Note that annotations added to the XML Schema do not
participate in validation of corresponding XML documents. This allows users to
use the same XML schema for mapping and validation of XML documents.
The new XML decomposition feature requires that the
annotated XML schema be registered in the XML Schema Repository. XML Schema
Repository is also a new feature introduced in DB2 Viper. As the name suggests,
it is a repository for XML schemas that may consist of one or more XML Schema
documents. The scope of XML schemas registered in XSR is limited to the
database. Any XML schema that is registered with XSR can be used for two purposes:
- Validation of XML documents as they are inserted into XML type columns
- Validation of XML documents as they are being decomposed into relational tables
Any XML schema registered in XSR can be enabled to be used
with XML decomposition if the XML schema has at least one decomposition-related
annotation present. When an XML schema is enabled for decomposition, checks are
made to ensure the correctness of the annotations, compatibility of XML Schema
types to DB2 column type and existence of relational tables and columns specified
in annotations. If the annotated XML schema is deemed valid, mapping
information is extracted and stored in catalog tables in a ready-to-use binary format.
Finally, any XML document that conforms to the annotated XML
schema can now be decomposed into relational tables as per the specified
mapping.
The annotated XML schema mapping language is designed with
the aim to serve users who already have an existing relational schema with
several applications and now want
to consume data from XML documents into the existing relational schema. The structure of the XML documents, as expressed by an XML schema, could be
very different from the relational schema, as the users may have less or no control of the design of XML Schema. Consequently, the mapping language
is designed to provide greater flexibility and granular control over the entire
process of decomposition. There are 11 different mapping constructs (some of them
even have multiple enumerated values) to express mapping from elements and
attributes declared in XML Schema to table-column pairs in the relational
schema. Each of these constructs is designed to primarily serve a well-defined purpose. Most
of these constructs are a result of feature requests that came from XML Extender customers. Table 1 lists these constructs and their primary goal:
Table 1. Mapping constructs and their primary goals
|
SNo.
|
Annotation construct
|
Purpose
|
Analogous DAD construct
|
Brief description
| |
Mapping XML elements/attributes to table-column pairs
|
|
db2-xdb:rowSet
|
Can be used to specify the logical or physical container (a physical container can be a relational table) of a logically associated set of rows.
|
table
|
This is used to specify the table to which the element/attribute is to be mapped.
|
|
db2-xdb:column
|
Must be used to specify the column into which the information associated with element/attribute should be decomposed into.
|
column
|
This is used in conjunction with the abovetableconstruct in DAD to specify the table-column pair to which the element/attribute is mapped.
|
|
db2-xdb:defaultSQLSchema
|
Should be used to specify the SQL schema of the relational tables involved.
|
n/a
|
The tables need to be qualified either explicitly, for every usage, or they will be automatically qualified by the user invoking the shred stored procedure.
|
|
db2-xdb:rowSetMapping
|
Groups all associated annotation constructs that specify a single mapping to a rowSet-column pair. One or more db2-xdb:rowSetMapping constructs is allowed on the same element/attribute declaration that maps to a different rowset-column pair.
|
RDB_Node
|
Allows a table-column pair mapping to be grouped. Only one such mapping is permitted on any element/attribute node.
|
|
db2-xdb:table
|
This annotation enables users to map multiple items from various parts of the XML document to be decomposed into the same table-column pair as different rows.
|
n/a
|
Such an operation cannot be done using any of the DAD constructs.
| |
Conditional decomposition
|
|
db2-xdb:locationPath
|
Allows users to specify different targets (tables) based on the context in which the element/attribute appears. This permits users to annotate elements and attributes declared in reusable types and groups to be decomposed based on the context.
|
n/a
|
The locationPath is implicit in the DAD itself, as the entire XML hierarchy needs to be repeated using the element_node/attribute_node construct.
|
|
db2-xdb:condition
|
Allows users to filter the data to be inserted based on the content of element/attribute in the XML document.
|
condition
|
This construct is overloaded in DAD. It has the following 2 purposes, depending on where it is specified.
- When specified on the root element, it is used to specify the join condition. In other words, this construct basically allows the same element/attribute to be mapped to multiple table-column pairs. This has to be explicitly specified in the annotated XML schema using db2-xdb:rowSetMappingconstruct.
- When specified on a non-root element, it is used to filter data. This functionality is similar to annotated XML schema's db2-xdb:condition construct.
| |
Control on data to be decomposed
|
|
db2-xdb:contentHandling
|
Allows users to select the content of an element from the following choices:
-
stringValue:
the concatenation of all descendant text nodes
-
textValue
the concatenation of all child text nodes
-
serializeSubtree
the serialized string, in its markup form, of everything between this element's start and end tags, inclusive of the start and end tags of the element itself and comments and processing instruction
|
n/a
|
This cannot be done in the shredding framework of XML Extender.
| |
Transformation of data to be decomposed
|
|
db2-xdb:normalization
|
Enables users to specify the normalization behavior for the content of the XML element or attribute that is being mapped to a character target column before the content is inserted. The following choices are provided:
-
canonical:
The XML value is converted to its canonical form, according to its XML schema type, before being inserted into the target column.
-
original:
The original character data.
-
whitespaceStrip:
All leading and trailing whitespace removed, and consecutive whitespace is collapsed into a single whitespace character before being inserted into the target column.
|
n/a
|
XML Extender shred does not perform any kind of normalization before insertion into the specified table-column.
|
|
db2-xdb:expression
|
Allows users to specify a customized expression, the result of which is inserted into the table this element is mapped to.
|
n/a
|
User specified data transformation cannot be done during shredding with XML Extender framework. If such an operation is required, either the users have to transform their XML document, using XSL, before shredding it or write triggers to modify the data before it is inserted.
|
|
db2-xdb:truncate
|
Allows users to specify whether truncation is permitted when an XML value is inserted into a character target column.
|
n/a
|
This operation is also not permitted in XML Extender.
| |
Note: db2-xdb is the prefix used for all our annotations. Users can use any prefix they choose; however, the prefix must be bound to the namespace URI http://www.ibm.com/xmlns/prod/db2/xdb1,
otherwise the annotations will be ignored by decomposition registration and runtime engine.
|
XML Extender shredding framework does not support element/attributes to be in any namespace.
|
For convenience, all of the listed annotations can be specified as non-native attributes of the element/attribute declarations or as a child of xdb:annotation/xdb:appinfo hierarchy using the db2-xdb:rowSetMapping construct. For more details on each of these annotations, refer to the DB2 documentation.
Besides the various mapping constructs
introduced in annotated XML schema decomposition, it also boasts of
a new algorithm that can determine the one-to-one or one-to-many
relationships without the introduction of any explicit
(user-specified) mapping construct. The algorithm can determine the
relationship by looking at the maxOccurs property and the model
groups involved in the items mapped to the same rowSet. Unlike
other decomposition solutions, due to this algorithm, there are no
constraints that allow only sibling or child elements/attributes to
be mapped to the same table. As long as all the elements/attributes
involved in mapping to the same rowSet forms a legal one-to-one or
one-to-many relationship, the elements/attributes mapped to the same
table can be from any part of the XML schema. The new algorithm, by
way of detecting a one-to-many relationship, allows multi-valued
dependencies to be decomposed into relational tables. The algorithm
not only creates rows based on the cardinality of
elements/attributes involved, but also creates rows based on the type of the
model group.
The new XML decomposition also has a strong
type validator and type conversion engine. It will disallow, during
registration, elements/attributes from being decomposed into target
columns if the declared schema type is deemed incompatible with the
corresponding target columns type. However, it allows for type
conversion if possible. For instance an element/attribute of type
integer can be mapped to a column of type character.
Due to the new architecture and algorithm,
early performance comparisons indicate that the new annotated schema
XML decomposition is much faster than XML Extender
Decomposition.
Migration considerations
Following are the primary migration
considerations when migrating from XML Extender Decomposition to
annotated XML schema decomposition:
-
Migration of mapping document:
Users need to migrate their DAD to annotated XML schema in
order to use the new functions. This is perhaps the biggest part of
the migration. However, there is help in the form of a tool that is
described below. The tool will help users convert their XML
schemas to annotated XML schema documents based on the DAD.
-
Addition of new registration step:
Users now need to register the annotated XML schema in the
XML Schema Repository. This is analogous to the step of
enable_collection in XML Extender. But since this was an optional
step in XML Extender, some users might not have a collection at all.
However, when using annotated XML schema decomposition, it is mandatory to register the XML schema in the XML Schema Repository and enable it for decomposition prior to decomposing any XML documents against it.
-
Migration of the
stored procedure call: The new stored procedure call
looks as follows:
xdbDecompXML ( rschema , xmlschemaname , xmldoc , documentid , validation ,
reserved , reserved , reserved ) |
rschema:
The
relational schema within which the XSRObject is created. The
XSRObject is created as a consequence of registering and completing
the XML schema, which is annotated for decomposition in this
case.
xmlschemaname:
The name of the XSRObject, which contains the XML schema that
describes, through annotations, how the XML document should be
decomposed.
xmldoc:
This
parameter is of type BLOB that is used to pass the XML document to
be decomposed.
documentid:
An identifier for the XML document. This is used to report diagnostic
information in the db2diag.log and is also returned as part of the
error message.
validation:
Should be set to one if the XML document should be validated as it is being
shred. It should be set to zero otherwise.
reserved:
All the reserved parameters should be passed in as null.
There are five
variations of this stored procedure based on the size of the XML
document to be decomposed. All of them have the same parameters,
except for the size of BLOB used to pass in the XML document.
Table 2 describes the different variations of the stored procedure:
Table 2. Different variations of the stored procedure
|
Stored procedure
name
|
Corresponding XML Extender stored procedure
|
Max document size that can be decomposed
| |
SYSPROC.XDBDECOMPXML
|
DB2XML.DXXSHREDXML
|
1MB
| |
SYSPROC.XDBDECOMPXML10MB
|
n/a
|
10MB
| |
SYSPROC.XDBDECOMPXML25MB
|
n/a
|
25MB
| |
SYSPROC.XDBDECOMPXML50MB
|
n/a
|
50MB
| |
SYSPROC.XDBDECOMPXML75MB
|
n/a
|
75MB
| |
SYSPROC.XDBDECOMPXML100MB
|
DB2XML.DXXSHREDXML100MB
|
100MB
|
Note that the annotated XML schema
stored procedures belong to the SYSPROC schema and are built in.
Its use does not require enabling of the database, as in the case of XML
Extender.
-
Error handling: The new stored
procedure now returns SQLCA with proper SQLCODE and SQLSTATE that
is more inline with the error handling framework when using SQL.
This allows the applications to use existing infrastructure, if
available, to handle SQL errors. The SQLCA can be used to format
and build localized messages.
Note that like all new XML
related features introduced in the DB2 Viper release, the new annotated
XML schema decomposition is also supported only on Unicode
databases and on single-partitioned databases.
XML schema converter tool
The DAD to annotated XML schema converter
utility helps users convert their XML schemas to annotated XML
schemas based on the mapping rules described in the DAD. The XML
schema and DAD must be describing the same set of XML
documents, although it is possible and acceptable that the XML
schema may describe a superset of documents described by the DAD.
Users that do not have an XML schema can easily generate XML
schemas from corresponding DTDs and even XML documents using tools
available freely on the internet. DAD to annotated XML schema converter tool can then take the XML
schema and the DAD as input to produce an annotated XML schema. The
tool supports import, include, and redefine construct of the XML
schema. (In other words, if an XML schema is spread across many XML schema
documents through import, include, or redefine, only the path for the
primary schema document -- the document through which all XML schema
documents can be reached through either import, include, or
redefine -- is needed.) The tool will annotate element/attribute
declarations across schema documents. Note that since DADs
do not support namespaces, it is impossible to have the use of
import construct in this scenario.
Usage:
DAD2AS dad_filename.dad primary_schema_doc.xsd defaultSQLSchemaName
|
This utility takes three inputs -- a DAD file,
the primary XML schema document, and the default SQL schema in which
the tables reside. Based on the mapping information specified in
the DAD, it converts the specified XML schema into an annotated XML
schema, which can be used with the annotated XML schema decomposition
feature of DB2 Viper.
Parameters
The
specified (RDB_NODE) DAD, which may currently be in use with
XML Extender shredding, is assumed to be a valid DAD. It provides
the mapping information to this tool. The utility will read the DAD
file, infer the mapping, and apply the same mapping to the
appropriate elements/attributes in the XML schema provided as the
second parameter.
The XML
schema must be in accordance with the DAD; there should not be
an element/attribute used in any context in the DAD that is not
defined in the same context in the XML schema. In other words, the
DAD and XML schema must be such that they work on the same family
of XML documents. (If the users do not have an XML schema but have
a DTD, they can use various tools to convert a DTD to XML schema.)
The tool assumes that none of the elements/attributes defined in
the XML schema will be in any namespace, as the namespace
qualification of elements/attributes is not supported in DAD
either.
Users are advised to keep a copy of their schema
documents before running this tool since this tool will modify the
primary schema document and any document that is part of the schema
that is referenced using xsd:include, xsd:import, or
xsd:redefine.
The
third parameter is used to qualify any unqualified table
references. If a reference to a table is not qualified in the DAD,
it will be assumed to reside in the specified default SQL schema.
However, this parameter is required even if all the tables in the
DAD are qualified.
Note: Running the tool multiple times on the
same schema documents will result in error during decomposition
enablement in XSR, as on each run the tool will add the same set of
mappings to the same set of element/attributes.
Tool prerequisites
-
JRE 1.4.2 or JRE 1.5
-
JARs for Eclipse Modeling Framework
-
JARs for XSD plug-in
-
JARS for xml4j-4_2_2 or equivalent Xerces-J.
The links to download these tools are listed below.
How to run it
Being written in Java™, this tool will work on all platforms supported by JRE. However, .bat files are provided for convenience to be used on Windows®. The .bat files sets the PATH and CLASSPATH variables. Users should set the paths
of the JARs in CLASSPATH and the JDK in accordance with their
installation. There are two flavors of the .bat file, namely,
Dad2AS_Eclipse.bat and Dad2AS_DB2DWB.bat. The file
Dad2AS_Eclipse.bat is customized for users that download Eclipse
and related plug-ins directly from www.eclipse.org, and Dad2AS_DB2DWB.bat
is for users that have downloaded DB2 Developer Workbench. Note
that if DB2 Developer Workbench is downloaded, all the required
plug-ins will come with it automatically.
Sample
A sample is also provided in the attached
file in the Download section. Once you have unzipped the download, the sample can be found in the sample sub-directory. The sample has a DAD file, the
original XML schema, and the converted annotated XML schema that was
produced using this tool.
The DAD file is Order.dad.
The original XSD file is Order.xsd
The annotated XML schema that is create using
this tool is Order_AS.xsd
Note that after inserting annotations in
Order_AS.xsd, it has
been run through a pretty-print XML tool for convenience.
Limitations of the tool
- Does not
add db2-xdb:condition annotation corresponding to element/attribute declaration for a condition
specified on a non-root element in the DAD using the <condition> construct in
DAD. This can either be added manually, or the source can be
modified to accomplish this as well.
Conclusion
Annotated XML schema decomposition is a new
feature introduced in DB2 Viper. It has been written from scratch
with new algorithms and mapping language. The mapping language and
algorithms together provide a highly flexible, efficient, and methodical way of decomposing XML documents. It allows XML schemas
that differ significantly in shape from the target relational
schema to be decomposed. Early performance comparisons already show that the new annotated XML schema decomposition feature is much faster in running time than the XML Extender Decomposition. In light of all the performance benefits, flexibility and predictability of results offered by this new feature, it will be beneficial to migrate to annotated XML schema decomposition in DB2
Viper.
The DAD to annotated XML schema generator tool will help users in
migration by converting the DAD to an equivalent annotated XML
schema, thus making the task of migration much easier.
Download | Description | Name | Size | Download method |
|---|
| DAD to annotated XML schema converter tool | DAD2AS.zip | 30KB | HTTP |
|---|
Resources Learn
Get products and technologies
Discuss
About the author  | |  | Mayank Pradhan is a staff software engineer at IBM, working in the XML Technology group. He is currently part of the development team for the annotated XML schema decomposition function being introduced in DB2 Viper. Previously, he was also part of the XML Extender development team. |
Rate this page
|  |