 | Level: Intermediate Mayank Pradhan (mayankpr@us.ibm.com), Staff Software Engineer, IBM
20 Apr 2006 The IBM® DB2® Viper release brings many new XML-related (and non-XML) features. One such feature is the annotated XML schema decomposition that allows you to decompose their XML documents into relational tables. The annotated XML schema supports various mapping constructs that allow you to map elements/attributes defined in the XML schema to table-column pairs in the relational schema. For large XML schemas consisting of many XML schema documents, manual annotation can be a cumbersome task. Get an introduction to the tool, DefaultAnnotater, that allows you to create default mapping and a default relational schema into which corresponding XML documents can be decomposed. This article provides a good starting platform for not only trying out the new function, but also further enhancing the mapping in a given XML schema.
Introduction to annotated XML schema decomposition
The annotated XML schema decomposition feature,
introduced in DB2 Viper, can be used to decompose XML documents,
wholly or partially, into relational tables. It uses annotations in
XML Schema as the mapping language to map information in an XML
document to relational tables. The new XML decomposition feature
uses XML Schema as the platform to obtain mapping information
because XML Schema is an open standard and requires fewer addition of
proprietary features in the mapping language specification.
This allows for a shorter learning curve as opposed to an equivalent
mapping specification in a completely proprietary language. Note
that annotations added to the XML schema do not participate in
validation of corresponding XML documents. This allows you to use
the same XML schema for mapping and validation of XML
documents.
The new XML decomposition feature requires
that the annotated XML schema be registered in the XML Schema
Repository. XML Schema Repository is also a new feature introduced
in DB2 Viper. As the name suggests, it is a repository for XML
schemas that may consist of one or more XML schema documents. The
scope of XML schemas registered in XML Schema
Repository is limited to the database.
Any XML schema that is registered with XML Schema
Repository can be used for two
purposes:
- Validation of XML documents as they are inserted into XML type columns
- Validation of XML documents as they are being decomposed into relational tables
Any XML schema registered in XML Schema
Repository can be
enabled to be used with XML decomposition if the XML schema has at
least one decomposition related annotation present. When an XML
schema is enabled for decomposition, checks are made to ensure the
correctness of the annotations, compatibility of XML schema types
to DB2 column type, and existence of relational tables and columns
specified in annotations. If the annotated XML schema is deemed
valid, mapping information is extracted and stored in catalog
tables in a ready-to-use binary format.
Finally, any XML document that conforms to the
annotated XML schema can now be decomposed into relational tables,
as per the specified mapping.
The annotated XML schema mapping language is
designed to serve users who already have an existing
relational schema with several applications already running off of
the relational schema and now want to consume data from XML
documents whose structure is expressed using an XML schema into
the existing relational schema. The structure of the XML schema
may be very different from the relational schema, as the users may
have very less or absolutely no control on the design of the XML
schema. Consequently, the mapping language is designed to provide
higher flexibility and granular control over the entire process of
decomposition. There are 11 different mapping constructs, some of
which even have multiple enumerated values, to express mapping from
elements and attributes declared in XML Schema to table-column
pairs in the relational schema. Each of these constructs is
designed to primarily serve a well-defined purpose. Most of these constructs are a
result of feature requests that came from customers. Table 1 lists these constructs and their primary goal:
Table 1. Mapping constructs and their primary goals
|
SNo.
|
Annotation construct
|
Purpose
| |
Mapping XML elements/attributes to
table-column pairs
|
|
db2-xdb:rowSet
|
Can be used to specify the logical or physical
container (a physical container can be a relational table) of
logically associated set of rows.
|
|
db2-xdb:column
|
Must be used to specify the column into which
the information associated with element/attribute should be
decomposed.
|
|
db2-xdb:defaultSQLSchema
|
Should be used to specify the SQL schema of
the relational tables involved.
|
|
db2-xdb:rowSetMapping
|
Groups all associated annotation
constructs that specify a single mapping to a rowSet-column pair.
One or more db2-xdb:rowSetMapping constructs is allowed
on the same element/attribute declaration that maps to different
rowset-column pair.
|
|
db2-xdb:table
|
Enables users to map multiple
items from various parts of the XML document to be decomposed into
the same table-column pair as different rows.
| |
Conditional decomposition
|
|
db2-xdb:locationPath
|
Allows users to specify different target
(tables) based on the context in which the element/attribute
appears. This permits users to annotate elements and attributes
declared in reusable types and groups to be decomposed based on the
context.
|
|
db2-xdb:condition
|
Allows users to filter the data to be inserted
based on the content of element/attribute in the XML document.
| |
Control on data to be decomposed
|
|
db2-xdb:contentHandling
|
Allows users to select the content of an
element from the following choices:
-
stringValue: The concatenation of all descendant text nodes
-
textValue: The concatenation of all child text nodes
-
serializeSubtree: The serialized string, in its markup form, of everything between this
element's start and end tags, inclusive of the start and end tags of the element itself, as well as comments and processing
instruction
| |
Transformation of data to be
decomposed
|
|
db2-xdb:normalization
|
Enables users to specify the normalization
behavior for the content of the XML element or attribute that is
being mapped to a character target column before the content is
inserted. The following choices are provided:
-
canonical: The XML value is converted
to its canonical form, according to its XML schema type, before
being inserted into the target column
-
original: The original character
data
-
whitespaceStrip: All leading and
trailing whitespace removed, and consecutive whitespace is
collapsed into a single whitespace character before being inserted
into the target column
|
|
db2-xdb:expression
|
Allows users to specify a customized
expression, the result of which is inserted into the table this
element is mapped to.
|
|
db2-xdb:truncate
|
Allows users to specify whether truncation is
permitted when an XML value is inserted into a character target
column.
| |
db2-xdb is the prefix used for
all our annotations. Users can use any prefix they choose. However,
the prefix must be bound to the namespace URI http://www.ibm.com/xmlns/prod/db2/xdb1. Otherwise, the annotations will be ignored by decomposition
registration and runtime engine.
|
For convenience, all of the listed annotations
can be specified as non-native attributes of the element/attribute
declarations or as a child of xsd:annotation/xsd:appinfo
hierarchy using the db2-xdb:rowSetMapping construct.
For more details on each of these annotations refer to the DB2
documentation.
Besides the various mapping constructs
introduced in annotated XML schema decomposition, it also boasts of
a new algorithm that can determine the one-to-one or one-to-many
relationships without the introduction of any explicit
(user-specified) mapping construct. The algorithm can determine the
relationship by looking at the maxOccurs property and the model
groups involved in the items mapped to the same rowSet. Unlike
other decomposition solutions, due to this algorithm, there are no
constraints that allow only sibling or child elements/attributes to
be mapped to the same table. As long as all the elements/attributes
involved in mapping to the same rowSet forms a legal one-to-one or
one-to-many relationship, the elements/attributes mapped to the same
table can be from any part of the XML Schema. The new algorithm, by
way of detecting a one-to-many relationship, allows multi-valued
dependencies to be decomposed into relational tables. The algorithm
not only creates rows based on the cardinality of
elements/attributes involved, but also creates rows based on the type of the
model group.
The new XML decomposition also has a strong
type validator and type conversion engine. It will disallow, during
registration, elements/attributes from being decomposed into target
columns if the declared schema type is deemed incompatible with the
corresponding target columns type. However, it allows for type
conversion if possible. For instance, an element/attribute of type
integer can be mapped to a column of type character.
Due to the new architecture and algorithm,
early performance comparisons indicate that new annotated schema
XML decomposition is much faster than XML Extender
Decomposition.
Default mapping generator tool
The problem of mapping existing XML schema to
an existing relational schema is much harder than creating a
default relational schema based on the XML schema and then
decomposing corresponding XML documents into it. Decomposition of
XML documents into a specially created relational schema is only a
special case of the harder and more generic problem that the new
feature claims to solve. However, with a new technology like this, it is useful to have default mapping. This not only helps users familiarize with the new mapping constructs but also provides a platform to further customize the applications based on their needs. This tool can create a relational schema and insert annotations in the input XML schema accordingly. In
production environments, default mapping almost always do not suffice;
some degree of tuning of the mapping is usually required. The
motive behind keeping the default mapping logic outside of
annotated XML schema decomposition allows users to
build upon the default mapping and fine-tune it based on their
requirements. Users are free to add any of the annotation
constructs introduced in the table above on top of the default
mapping annotations inserted in the XML schema. The tool described
in this section allows users to generate default mapping and
relational schema for any given XML Schema.
The default mapping generator utility allows
users to decompose XML documents without having to write a single
line of annotation in the XML schema. The tool requires two
inputs:
- The primary XML schema document: The XML schema document through which all XML schema documents can be
reached by exploring the import/include/redefine constructs
- The default relational schema for all the tables involved
It then produces the following two outputs:
- It prints out the DDLs for the creation of
relational schema onto the standard output
- It inserts the required annotations in the
XML schema so as to accomplish the mapping of all
elements/attributes into the relational schema returned as output
The tool starts at the global element
declaration and digs through the hierarchy to annotate each
element/attribute on its way; the same is done for each global
element declaration. As it traverses through the XML schema, it also
creates the DDLs for the relational schema. The name of each
complexType element
that it encounters during traversal is treated as a new table name.
The names of child attributes and elements of simpleType are the names of the
columns of that table. The tool also inserts annotations in the
element/attribute declaration in XML Schema to map it to the
corresponding table-column pair. Another important decision to make
during the generation of DDLs is the selection of data type for the
columns. The tool makes a default choice. The following table
indicates the DB2 type that will be selected for a column when an
XML schema type is encountered.
|
Sno.
|
XML Schema element/attribute type
|
DB2 target column type
|
|
anyType
|
VARCHAR(20)
|
|
anySimpleType
|
VARCHAR(20)
|
|
Duration
|
INTEGER
|
|
dateTime
|
TIMESTAMP*
|
|
time
|
TIME*
|
|
date
|
DATE*
|
|
gMonth
|
INTEGER
|
|
gYear
|
INTEGER
|
|
gYearMonth
|
INTEGER
|
|
boolean
|
CHAR(5)
|
|
base64Binary
|
BLOB(1M)
|
|
hexBinary
|
BLOB(1M)
|
|
anyURI
|
VARCHAR(128)
|
|
QName
|
VARCHAR(256)
|
|
NOTATION
|
VARCHAR(256)
|
|
float
|
DOUBLE
|
|
double
|
DOUBLE
|
|
decimal
|
DOUBLE
|
|
integer
|
BIGINT
|
|
long
|
BIGINT
|
|
int
|
INTEGER
|
|
short
|
SMALLINT
|
|
byte
|
SMALLINT
|
|
nonPositiveInteger
|
BIGINT
|
|
negativeInteger
|
BIGINT
|
|
nonNegativeInteger
|
BIGINT
|
|
positiveInteger
|
BIGINT
|
|
unsignedLong
|
BIGINT
|
|
unsignedInt
|
BIGINT
|
|
unsignedShort
|
INTEGER
|
|
unsignedByte
|
SHORT
|
|
string
|
VARCHAR(128)
|
|
normalizedString
|
VARCHAR(128)
|
|
token
|
VARCHAR(20)
|
|
language
|
VARCHAR(20)
|
|
Name
|
VARCHAR(20)
|
|
NCName
|
VARCHAR(20)
|
|
ID
|
VARCHAR(20)
|
|
IDREF
|
VARCHAR(20)
|
|
IDREFS
|
VARCHAR(20)
|
|
ENTITY
|
VARCHAR(20)
|
|
ENTITIES
|
VARCHAR(128)
|
|
NMTOKEN
|
VARCHAR(20)
|
|
NMTOKENS
|
VARCHAR(128)
|
|
union
|
VARCHAR(128)
|
|
list
|
VARCHAR(128)
| |
Note:
* indicates that the db2-xdb:truncate annotation is used for these
datatypes. This allows the time zone information to be truncated, as
DB2 data types do not support time zones.
|
This is, of course, the table of default mapping
of data types. You can change this by either modifying the DDL
statements produced or, more permanently, by modifying the array
XSD2DB2[] in file
TableCreator.java in
the attached source.
Usage:
DefaultAnnotation primary_schema_doc.xsd defaultSQLSchemaName
|
This utility takes two inputs -- the primary
XML Schema Document and the default SQL Schema in which the tables
reside. Based on the algorithm described above, the tool will create
and print DDLs on the standard output device and insert
corresponding annotations to map the element/attribute to the
table-column. The annotated XML schema can then be registered with
XSR and enabled for decomposition. You can then use created XSRObject to decompose XML documents.
Parameters
The XML schema: The primary XML schema
document through which all related schema documents can be
explored. Users are advised to keep a separate copy of their schema
documents before running this tool since this tool will modify the
primary schema document and any document that is part of the schema
that is referenced using xsd:include, xsd:import, or
xsd:redefine.
The second parameter is used to qualify all table references.
Note: Running the tool multiple times on the
same schema documents will result in error during decomposition
enablement in XML Schema Repository, as on each run the tool will add the same set of
mappings to the same set of element/attributes.
Tool prerequisites
- Java™ Runtime Environment (JRE) 1.4.2 or JRE
1.5
- Java Archive (JAR) files for Eclipse Modeling Framework
- JARs for XSD plugin
- JARS for xml4j-4_2_2 or equivalent Xerces-J.
The links to obtain these tools are listed in Resources.
How to run it
Being written in Java, this tool will work on all platforms supported by JRE. However, .bat files are provided for convenience to be used on Windows® platform. The .bat files set the PATH and
CLASSPATH variables. You should set the paths of the JARs in
CLASSPATH and the JDK in accordance with their installation. There
are two flavors of the .bat file, namely, DefAS_Eclipse.bat and
DefAS _DB2DWB.bat. The file DefAS_Eclipse.bat is customized for
users that download Eclipse and related plug-ins directly from
www.eclipse.org, and
DefAS_DB2DWB.bat is for users that have DB2 Developer
Workbench installed. Note that if DB2 Developer Workbench is installed on the same machine, all
the required plug-ins will come with it automatically.
Sample
A sample is also provided in the attached
file in the Download section. Once you have unzipped the download, the sample can be found in the sample sub-directory.
The sample has the original XML schema, the converted annotated XML
schema, and the DDL containing the new relational schema that was
produced using this tool.
The original XSD file is Order.xsd.
The annotated XML schema that is create using this tool is Order_AS.xsd.
The new relational schema is Order.sql.
Note that after inserting annotations in
Order_AS.xsd, it has
been run through a pretty-print XML tool for convenience.
Limitations of the tool
- The default mapping may result in loss of
information related to association between entities. For instance,
if a complexType element PurchaseOrder has a child element Customer
of complexType, then the purchase order related information, such
as POID, Date, and so on, will be decomposed into the PurchaseOrder table and
Customer related information, such as CustID, CustName, and so on, will be
stored in the Customers table. But the fact that a certain customer was
referenced in context of the purchase order will be lost. However,
this limitation can be overcome by manually inserting the
annotations. In this case, the association information can be
recorded by creating another table that stores the association
PO_Cust. Then add annotations on the POID element or attribute and
to CustID element/attribute, so as to map them to the table
PO_Cust.
- The tool does not support annotation of recursive XML schemas.
In such cases, you may get an error from JRE indicating that the process has run out of stack space.
- The tool does not use many of the annotation
constructs.
Steps to run the sample
- Run the tool on any XML schema and capture
the DDLs written to the standard output.
- Run the DDLs to create the new relational
schema.
- Register the XML schema and enable it for
decomposition using the following CLP commands:
REGISTER XML SCHEMA order.xsd from
c:\xsd\order.xsd AS po.order COMPLETE ENABLE DECOMPOSITION |
- If step 3 goes through successfully, then
decompose the XML document using the following command:
DECOMPOSE XML DOCUMENT c:\xml\order.xml
XMLSCHEMA po.order VALIDATE |
Or call the stored procedure to decompose the
XML document. The stored procedure and its description is as
follows:
xdbDecompXML ( rschema , xmlschemaname, xmldoc , documentid , validation ,
reserved , reserved ,reserved ) |
rschema
: The
relational schema within which the XSRObject is created. The
XSRObject is created as a consequence of registering and completing
the XML Schema, which is annotated for decomposition in this
case.
xmlschemaname
: The
name of the XSRObject, which contains the XML schema that
describes, through annotations, how the XML document should be
decomposed.
xmldoc
: This
parameter is of type BLOB that is used to pass the XML document to
be decomposed.
documentid
: An
identifier for the XML document. This is used to report diagnostic
information in the db2diag.log and is also returned as part of the
error message.
validation
: Should
be set to one if the XML document should be validated as it is being
shred. It should be zero otherwise.
reserved
: All the
reserved parameters should be passed in as null.
There are five
variations of this stored procedure based on the size of the XML
document to be decomposed. All of them have the same parameters,
except for the size of BLOB used to pass in the XML document.
Table 2 describes the different variations of the stored procedure:
Table 2. Different variations of the stored procedure
|
Stored procedure name
|
Max document size that can be decomposed
| |
SYSPROC.XDBDECOMPXML
|
1MB
| |
SYSPROC.XDBDECOMPXML10MB
|
10MB
| |
SYSPROC.XDBDECOMPXML25MB
|
25MB
| |
SYSPROC.XDBDECOMPXML50MB
|
50MB
| |
SYSPROC.XDBDECOMPXML75MB
|
75MB
| |
SYSPROC.XDBDECOMPXML100MB
|
100MB
|
Note that the
annotated XML schema stored procedures belong to the SYSPROC schema
and are built in. Its use does not require enabling of the database,
as in case of XML Extender.
These four steps can be used as a template to
decompose documents. The only difference that may occur is when an
XML schema consists of multiple XML schema documents, as all the schema documents need to be registered before completing
the XML schema.
Conclusion
Annotated XML schema decomposition is a new feature being introduced in the DB2 Viper release. It has been written with new algorithms and introduces a new mapping language.
The mapping language and algorithms together provide a highly
flexible, efficient, and methodical way of decomposing XML
documents. It allows XML schemas that differ significantly in shape
from the target relational schema to be decomposed. The tool,
DefaultAnnotater, allows you to get quickly started with the new
function by creating a default relational schema and inserting
corresponding annotations in the XML Schema. The tool also helps
you quickly learn the various constructs on Annotated XML Schema
Decomposition.
Download | Description | Name | Size | Download method |
|---|
| Source and sample for DefaultAnnotater tool | DefaultAnnotater.zip | 25KB | HTTP |
|---|
Resources Learn
Get products and technologies
-
Download the Eclipse SDK.
-
Download Xerces-J.
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
-
Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
Discuss
About the author  | |  |
Mayank Pradhan is a staff software engineer at IBM, working in the XML Technology group. He is currently part of the development team for the annotated XML schema decomposition function being introduced in DB2 Viper. Previously, he was also part of the XML Extender development team. |
Rate this page
|  |