Introduction to annotated XML schema decomposition
The annotated XML schema decomposition feature, introduced in DB2 Viper, can be used to decompose XML documents, wholly or partially, into relational tables. It uses annotations in XML Schema as the mapping language to map information in an XML document to relational tables. The new XML decomposition feature uses XML Schema as the platform to obtain mapping information because XML Schema is an open standard and requires fewer addition of proprietary features in the mapping language specification. This allows for a shorter learning curve as opposed to an equivalent mapping specification in a completely proprietary language. Note that annotations added to the XML schema do not participate in validation of corresponding XML documents. This allows you to use the same XML schema for mapping and validation of XML documents.
The new XML decomposition feature requires that the annotated XML schema be registered in the XML Schema Repository. XML Schema Repository is also a new feature introduced in DB2 Viper. As the name suggests, it is a repository for XML schemas that may consist of one or more XML schema documents. The scope of XML schemas registered in XML Schema Repository is limited to the database. Any XML schema that is registered with XML Schema Repository can be used for two purposes:
- Validation of XML documents as they are inserted into XML type columns
- Validation of XML documents as they are being decomposed into relational tables
Any XML schema registered in XML Schema Repository can be enabled to be used with XML decomposition if the XML schema has at least one decomposition related annotation present. When an XML schema is enabled for decomposition, checks are made to ensure the correctness of the annotations, compatibility of XML schema types to DB2 column type, and existence of relational tables and columns specified in annotations. If the annotated XML schema is deemed valid, mapping information is extracted and stored in catalog tables in a ready-to-use binary format.
Finally, any XML document that conforms to the annotated XML schema can now be decomposed into relational tables, as per the specified mapping.
The annotated XML schema mapping language is designed to serve users who already have an existing relational schema with several applications already running off of the relational schema and now want to consume data from XML documents whose structure is expressed using an XML schema into the existing relational schema. The structure of the XML schema may be very different from the relational schema, as the users may have very less or absolutely no control on the design of the XML schema. Consequently, the mapping language is designed to provide higher flexibility and granular control over the entire process of decomposition. There are 11 different mapping constructs, some of which even have multiple enumerated values, to express mapping from elements and attributes declared in XML Schema to table-column pairs in the relational schema. Each of these constructs is designed to primarily serve a well-defined purpose. Most of these constructs are a result of feature requests that came from customers. Table 1 lists these constructs and their primary goal:
Table 1. Mapping constructs and their primary goals
| SNo. | Annotation construct | Purpose |
|---|---|---|
|
Mapping XML elements/attributes to table-column pairs | ||
|
|
db2-xdb:rowSet |
Can be used to specify the logical or physical container (a physical container can be a relational table) of logically associated set of rows. |
|
|
db2-xdb:column |
Must be used to specify the column into which the information associated with element/attribute should be decomposed. |
|
|
db2-xdb:defaultSQLSchema |
Should be used to specify the SQL schema of the relational tables involved. |
|
|
db2-xdb:rowSetMapping |
Groups all associated annotation constructs that specify a single mapping to a rowSet-column pair. One or more db2-xdb:rowSetMapping constructs is allowed on the same element/attribute declaration that maps to different rowset-column pair. |
|
|
db2-xdb:table |
Enables users to map multiple items from various parts of the XML document to be decomposed into the same table-column pair as different rows. |
|
Conditional decomposition | ||
|
|
db2-xdb:locationPath |
Allows users to specify different target (tables) based on the context in which the element/attribute appears. This permits users to annotate elements and attributes declared in reusable types and groups to be decomposed based on the context. |
|
|
db2-xdb:condition |
Allows users to filter the data to be inserted based on the content of element/attribute in the XML document. |
|
Control on data to be decomposed | ||
|
|
db2-xdb:contentHandling |
Allows users to select the content of an element from the following choices:
|
|
Transformation of data to be decomposed | ||
|
|
db2-xdb:normalization |
Enables users to specify the normalization behavior for the content of the XML element or attribute that is being mapped to a character target column before the content is inserted. The following choices are provided:
|
|
|
db2-xdb:expression |
Allows users to specify a customized expression, the result of which is inserted into the table this element is mapped to. |
|
|
db2-xdb:truncate |
Allows users to specify whether truncation is permitted when an XML value is inserted into a character target column. |
|
db2-xdb is the prefix used for all our annotations. Users can use any prefix they choose. However, the prefix must be bound to the namespace URI http://www.ibm.com/xmlns/prod/db2/xdb1. Otherwise, the annotations will be ignored by decomposition registration and runtime engine. | ||
For convenience, all of the listed annotations can be specified as non-native attributes of the element/attribute declarations or as a child of xsd:annotation/xsd:appinfo hierarchy using the db2-xdb:rowSetMapping construct. For more details on each of these annotations refer to the DB2 documentation.
Besides the various mapping constructs introduced in annotated XML schema decomposition, it also boasts of a new algorithm that can determine the one-to-one or one-to-many relationships without the introduction of any explicit (user-specified) mapping construct. The algorithm can determine the relationship by looking at the maxOccurs property and the model groups involved in the items mapped to the same rowSet. Unlike other decomposition solutions, due to this algorithm, there are no constraints that allow only sibling or child elements/attributes to be mapped to the same table. As long as all the elements/attributes involved in mapping to the same rowSet forms a legal one-to-one or one-to-many relationship, the elements/attributes mapped to the same table can be from any part of the XML Schema. The new algorithm, by way of detecting a one-to-many relationship, allows multi-valued dependencies to be decomposed into relational tables. The algorithm not only creates rows based on the cardinality of elements/attributes involved, but also creates rows based on the type of the model group.
The new XML decomposition also has a strong type validator and type conversion engine. It will disallow, during registration, elements/attributes from being decomposed into target columns if the declared schema type is deemed incompatible with the corresponding target columns type. However, it allows for type conversion if possible. For instance, an element/attribute of type integer can be mapped to a column of type character.
Due to the new architecture and algorithm, early performance comparisons indicate that new annotated schema XML decomposition is much faster than XML Extender Decomposition.
Default mapping generator tool
The problem of mapping existing XML schema to an existing relational schema is much harder than creating a default relational schema based on the XML schema and then decomposing corresponding XML documents into it. Decomposition of XML documents into a specially created relational schema is only a special case of the harder and more generic problem that the new feature claims to solve. However, with a new technology like this, it is useful to have default mapping. This not only helps users familiarize with the new mapping constructs but also provides a platform to further customize the applications based on their needs. This tool can create a relational schema and insert annotations in the input XML schema accordingly. In production environments, default mapping almost always do not suffice; some degree of tuning of the mapping is usually required. The motive behind keeping the default mapping logic outside of annotated XML schema decomposition allows users to build upon the default mapping and fine-tune it based on their requirements. Users are free to add any of the annotation constructs introduced in the table above on top of the default mapping annotations inserted in the XML schema. The tool described in this section allows users to generate default mapping and relational schema for any given XML Schema.
The default mapping generator utility allows users to decompose XML documents without having to write a single line of annotation in the XML schema. The tool requires two inputs:
- The primary XML schema document: The XML schema document through which all XML schema documents can be reached by exploring the import/include/redefine constructs
- The default relational schema for all the tables involved
It then produces the following two outputs:
- It prints out the DDLs for the creation of relational schema onto the standard output
- It inserts the required annotations in the XML schema so as to accomplish the mapping of all elements/attributes into the relational schema returned as output
The tool starts at the global element declaration and digs through the hierarchy to annotate each element/attribute on its way; the same is done for each global element declaration. As it traverses through the XML schema, it also creates the DDLs for the relational schema. The name of each complexType element that it encounters during traversal is treated as a new table name. The names of child attributes and elements of simpleType are the names of the columns of that table. The tool also inserts annotations in the element/attribute declaration in XML Schema to map it to the corresponding table-column pair. Another important decision to make during the generation of DDLs is the selection of data type for the columns. The tool makes a default choice. The following table indicates the DB2 type that will be selected for a column when an XML schema type is encountered.
| Sno. | XML Schema element/attribute type | DB2 target column type |
|---|---|---|
|
|
anyType |
VARCHAR(20) |
|
|
anySimpleType |
VARCHAR(20) |
|
|
Duration |
INTEGER |
|
|
dateTime |
TIMESTAMP* |
|
|
time |
TIME* |
|
|
date |
DATE* |
|
|
gMonth |
INTEGER |
|
|
gYear |
INTEGER |
|
|
gYearMonth |
INTEGER |
|
|
boolean |
CHAR(5) |
|
|
base64Binary |
BLOB(1M) |
|
|
hexBinary |
BLOB(1M) |
|
|
anyURI |
VARCHAR(128) |
|
|
QName |
VARCHAR(256) |
|
|
NOTATION |
VARCHAR(256) |
|
|
float |
DOUBLE |
|
|
double |
DOUBLE |
|
|
decimal |
DOUBLE |
|
|
integer |
BIGINT |
|
|
long |
BIGINT |
|
|
int |
INTEGER |
|
|
short |
SMALLINT |
|
|
byte |
SMALLINT |
|
|
nonPositiveInteger |
BIGINT |
|
|
negativeInteger |
BIGINT |
|
|
nonNegativeInteger |
BIGINT |
|
|
positiveInteger |
BIGINT |
|
|
unsignedLong |
BIGINT |
|
|
unsignedInt |
BIGINT |
|
|
unsignedShort |
INTEGER |
|
|
unsignedByte |
SHORT |
|
|
string |
VARCHAR(128) |
|
|
normalizedString |
VARCHAR(128) |
|
|
token |
VARCHAR(20) |
|
|
language |
VARCHAR(20) |
|
|
Name |
VARCHAR(20) |
|
|
NCName |
VARCHAR(20) |
|
|
ID |
VARCHAR(20) |
|
|
IDREF |
VARCHAR(20) |
|
|
IDREFS |
VARCHAR(20) |
|
|
ENTITY |
VARCHAR(20) |
|
|
ENTITIES |
VARCHAR(128) |
|
|
NMTOKEN |
VARCHAR(20) |
|
|
NMTOKENS |
VARCHAR(128) |
|
|
union |
VARCHAR(128) |
|
|
list |
VARCHAR(128) |
|
Note: * indicates that the db2-xdb:truncate annotation is used for these datatypes. This allows the time zone information to be truncated, as DB2 data types do not support time zones. | ||
This is, of course, the table of default mapping of data types. You can change this by either modifying the DDL statements produced or, more permanently, by modifying the array XSD2DB2[] in file TableCreator.java in the attached source.
Usage: DefaultAnnotation primary_schema_doc.xsd defaultSQLSchemaName |
This utility takes two inputs -- the primary XML Schema Document and the default SQL Schema in which the tables reside. Based on the algorithm described above, the tool will create and print DDLs on the standard output device and insert corresponding annotations to map the element/attribute to the table-column. The annotated XML schema can then be registered with XSR and enabled for decomposition. You can then use created XSRObject to decompose XML documents.
The XML schema: The primary XML schema document through which all related schema documents can be explored. Users are advised to keep a separate copy of their schema documents before running this tool since this tool will modify the primary schema document and any document that is part of the schema that is referenced using xsd:include, xsd:import, or xsd:redefine.
The second parameter is used to qualify all table references.
Note: Running the tool multiple times on the same schema documents will result in error during decomposition enablement in XML Schema Repository, as on each run the tool will add the same set of mappings to the same set of element/attributes.
- Java™ Runtime Environment (JRE) 1.4.2 or JRE 1.5
- Java Archive (JAR) files for Eclipse Modeling Framework
- JARs for XSD plugin
- JARS for xml4j-4_2_2 or equivalent Xerces-J.
The links to obtain these tools are listed in Resources.
Being written in Java, this tool will work on all platforms supported by JRE. However, .bat files are provided for convenience to be used on Windows® platform. The .bat files set the PATH and CLASSPATH variables. You should set the paths of the JARs in CLASSPATH and the JDK in accordance with their installation. There are two flavors of the .bat file, namely, DefAS_Eclipse.bat and DefAS _DB2DWB.bat. The file DefAS_Eclipse.bat is customized for users that download Eclipse and related plug-ins directly from www.eclipse.org, and DefAS_DB2DWB.bat is for users that have DB2 Developer Workbench installed. Note that if DB2 Developer Workbench is installed on the same machine, all the required plug-ins will come with it automatically.
A sample is also provided in the attached file in the Download section. Once you have unzipped the download, the sample can be found in the sample sub-directory. The sample has the original XML schema, the converted annotated XML schema, and the DDL containing the new relational schema that was produced using this tool.
The original XSD file is Order.xsd.
The annotated XML schema that is create using this tool is Order_AS.xsd.
The new relational schema is Order.sql.
Note that after inserting annotations in Order_AS.xsd, it has been run through a pretty-print XML tool for convenience.
- The default mapping may result in loss of information related to association between entities. For instance, if a complexType element PurchaseOrder has a child element Customer of complexType, then the purchase order related information, such as POID, Date, and so on, will be decomposed into the PurchaseOrder table and Customer related information, such as CustID, CustName, and so on, will be stored in the Customers table. But the fact that a certain customer was referenced in context of the purchase order will be lost. However, this limitation can be overcome by manually inserting the annotations. In this case, the association information can be recorded by creating another table that stores the association PO_Cust. Then add annotations on the POID element or attribute and to CustID element/attribute, so as to map them to the table PO_Cust.
- The tool does not support annotation of recursive XML schemas. In such cases, you may get an error from JRE indicating that the process has run out of stack space.
- The tool does not use many of the annotation constructs.
- Run the tool on any XML schema and capture the DDLs written to the standard output.
- Run the DDLs to create the new relational schema.
- Register the XML schema and enable it for
decomposition using the following CLP commands:
REGISTER XML SCHEMA order.xsd from c:\xsd\order.xsd AS po.order COMPLETE ENABLE DECOMPOSITION
- If step 3 goes through successfully, then
decompose the XML document using the following command:
DECOMPOSE XML DOCUMENT c:\xml\order.xml XMLSCHEMA po.order VALIDATE
Or call the stored procedure to decompose the XML document. The stored procedure and its description is as follows:
xdbDecompXML ( rschema , xmlschemaname, xmldoc , documentid , validation , reserved , reserved ,reserved )
rschema : The relational schema within which the XSRObject is created. The XSRObject is created as a consequence of registering and completing the XML Schema, which is annotated for decomposition in this case.
xmlschemaname : The name of the XSRObject, which contains the XML schema that describes, through annotations, how the XML document should be decomposed.
xmldoc : This parameter is of type BLOB that is used to pass the XML document to be decomposed.
documentid : An identifier for the XML document. This is used to report diagnostic information in the db2diag.log and is also returned as part of the error message.
validation : Should be set to one if the XML document should be validated as it is being shred. It should be zero otherwise.
reserved : All the reserved parameters should be passed in as null.
There are five variations of this stored procedure based on the size of the XML document to be decomposed. All of them have the same parameters, except for the size of BLOB used to pass in the XML document. Table 2 describes the different variations of the stored procedure:
Table 2. Different variations of the stored procedure
| Stored procedure name | Max document size that can be decomposed |
|---|---|
|
SYSPROC.XDBDECOMPXML |
1MB |
|
SYSPROC.XDBDECOMPXML10MB |
10MB |
|
SYSPROC.XDBDECOMPXML25MB |
25MB |
|
SYSPROC.XDBDECOMPXML50MB |
50MB |
|
SYSPROC.XDBDECOMPXML75MB |
75MB |
|
SYSPROC.XDBDECOMPXML100MB |
100MB |
Note that the annotated XML schema stored procedures belong to the SYSPROC schema and are built in. Its use does not require enabling of the database, as in case of XML Extender.
These four steps can be used as a template to decompose documents. The only difference that may occur is when an XML schema consists of multiple XML schema documents, as all the schema documents need to be registered before completing the XML schema.
Annotated XML schema decomposition is a new feature being introduced in the DB2 Viper release. It has been written with new algorithms and introduces a new mapping language. The mapping language and algorithms together provide a highly flexible, efficient, and methodical way of decomposing XML documents. It allows XML schemas that differ significantly in shape from the target relational schema to be decomposed. The tool, DefaultAnnotater, allows you to get quickly started with the new function by creating a default relational schema and inserting corresponding annotations in the XML Schema. The tool also helps you quickly learn the various constructs on Annotated XML Schema Decomposition.
| Description | Name | Size | Download method |
|---|---|---|---|
| Source and sample for DefaultAnnotater tool | DefaultAnnotater.zip | 25KB | HTTP |
Information about download methods
Learn
-
developerWorks DB2 UDB page: Learn more about DB2 Viper.
-
Information management and XML technology page: Find articles and resources for implementing XML technology with DB2.
-
developerWorks Information Management zone: Find more resources
for DB2 UDB developers and administrators.
-
Stay current with developerWorks
technical events and webcasts.
-
Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for the community.
Get products and technologies
-
Download the Eclipse SDK.
-
Download Xerces-J.
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
-
Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
Discuss
- Participate in the discussion forum.
-
developerWorks blogs: Get involved in
the developerWorks community.
Comments (Undergoing maintenance)





