Skip to main content

Default mapping for annotated XML schema

Get a fast start with annotated XML schema decomposition using the DefaultAnnotater tool

Mayank Pradhan (mayankpr@us.ibm.com), Staff Software Engineer, IBM
Mayank Pradhan is a staff software engineer at IBM, working in the XML Technology group. He is currently part of the development team for the annotated XML schema decomposition function being introduced in DB2 Viper. Previously, he was also part of the XML Extender development team.

Summary:  The IBM® DB2® Viper release brings many new XML-related (and non-XML) features. One such feature is the annotated XML schema decomposition that allows you to decompose their XML documents into relational tables. The annotated XML schema supports various mapping constructs that allow you to map elements/attributes defined in the XML schema to table-column pairs in the relational schema. For large XML schemas consisting of many XML schema documents, manual annotation can be a cumbersome task. Get an introduction to the tool, DefaultAnnotater, that allows you to create default mapping and a default relational schema into which corresponding XML documents can be decomposed. This article provides a good starting platform for not only trying out the new function, but also further enhancing the mapping in a given XML schema.

Date:  20 Apr 2006
Level:  Intermediate
Activity:  904 views

Introduction to annotated XML schema decomposition

The annotated XML schema decomposition feature, introduced in DB2 Viper, can be used to decompose XML documents, wholly or partially, into relational tables. It uses annotations in XML Schema as the mapping language to map information in an XML document to relational tables. The new XML decomposition feature uses XML Schema as the platform to obtain mapping information because XML Schema is an open standard and requires fewer addition of proprietary features in the mapping language specification. This allows for a shorter learning curve as opposed to an equivalent mapping specification in a completely proprietary language. Note that annotations added to the XML schema do not participate in validation of corresponding XML documents. This allows you to use the same XML schema for mapping and validation of XML documents.

The new XML decomposition feature requires that the annotated XML schema be registered in the XML Schema Repository. XML Schema Repository is also a new feature introduced in DB2 Viper. As the name suggests, it is a repository for XML schemas that may consist of one or more XML schema documents. The scope of XML schemas registered in XML Schema Repository is limited to the database. Any XML schema that is registered with XML Schema Repository can be used for two purposes:

  1. Validation of XML documents as they are inserted into XML type columns
  2. Validation of XML documents as they are being decomposed into relational tables

Any XML schema registered in XML Schema Repository can be enabled to be used with XML decomposition if the XML schema has at least one decomposition related annotation present. When an XML schema is enabled for decomposition, checks are made to ensure the correctness of the annotations, compatibility of XML schema types to DB2 column type, and existence of relational tables and columns specified in annotations. If the annotated XML schema is deemed valid, mapping information is extracted and stored in catalog tables in a ready-to-use binary format.

Finally, any XML document that conforms to the annotated XML schema can now be decomposed into relational tables, as per the specified mapping.

The annotated XML schema mapping language is designed to serve users who already have an existing relational schema with several applications already running off of the relational schema and now want to consume data from XML documents whose structure is expressed using an XML schema into the existing relational schema. The structure of the XML schema may be very different from the relational schema, as the users may have very less or absolutely no control on the design of the XML schema. Consequently, the mapping language is designed to provide higher flexibility and granular control over the entire process of decomposition. There are 11 different mapping constructs, some of which even have multiple enumerated values, to express mapping from elements and attributes declared in XML Schema to table-column pairs in the relational schema. Each of these constructs is designed to primarily serve a well-defined purpose. Most of these constructs are a result of feature requests that came from customers. Table 1 lists these constructs and their primary goal:


Table 1. Mapping constructs and their primary goals
SNo. Annotation construct Purpose

Mapping XML elements/attributes to table-column pairs

db2-xdb:rowSet

Can be used to specify the logical or physical container (a physical container can be a relational table) of logically associated set of rows.

db2-xdb:column

Must be used to specify the column into which the information associated with element/attribute should be decomposed.

db2-xdb:defaultSQLSchema

Should be used to specify the SQL schema of the relational tables involved.

db2-xdb:rowSetMapping

Groups all associated annotation constructs that specify a single mapping to a rowSet-column pair. One or more db2-xdb:rowSetMapping constructs is allowed on the same element/attribute declaration that maps to different rowset-column pair.

db2-xdb:table

Enables users to map multiple items from various parts of the XML document to be decomposed into the same table-column pair as different rows.

Conditional decomposition

db2-xdb:locationPath

Allows users to specify different target (tables) based on the context in which the element/attribute appears. This permits users to annotate elements and attributes declared in reusable types and groups to be decomposed based on the context.

db2-xdb:condition

Allows users to filter the data to be inserted based on the content of element/attribute in the XML document.

Control on data to be decomposed

db2-xdb:contentHandling

Allows users to select the content of an element from the following choices:

  • stringValue: The concatenation of all descendant text nodes
  • textValue: The concatenation of all child text nodes
  • serializeSubtree: The serialized string, in its markup form, of everything between this element's start and end tags, inclusive of the start and end tags of the element itself, as well as comments and processing instruction

Transformation of data to be decomposed

db2-xdb:normalization

Enables users to specify the normalization behavior for the content of the XML element or attribute that is being mapped to a character target column before the content is inserted. The following choices are provided:

  • canonical: The XML value is converted to its canonical form, according to its XML schema type, before being inserted into the target column
  • original: The original character data
  • whitespaceStrip: All leading and trailing whitespace removed, and consecutive whitespace is collapsed into a single whitespace character before being inserted into the target column

db2-xdb:expression

Allows users to specify a customized expression, the result of which is inserted into the table this element is mapped to.

db2-xdb:truncate

Allows users to specify whether truncation is permitted when an XML value is inserted into a character target column.

db2-xdb is the prefix used for all our annotations. Users can use any prefix they choose. However, the prefix must be bound to the namespace URI http://www.ibm.com/xmlns/prod/db2/xdb1. Otherwise, the annotations will be ignored by decomposition registration and runtime engine.

For convenience, all of the listed annotations can be specified as non-native attributes of the element/attribute declarations or as a child of xsd:annotation/xsd:appinfo hierarchy using the db2-xdb:rowSetMapping construct. For more details on each of these annotations refer to the DB2 documentation.

Besides the various mapping constructs introduced in annotated XML schema decomposition, it also boasts of a new algorithm that can determine the one-to-one or one-to-many relationships without the introduction of any explicit (user-specified) mapping construct. The algorithm can determine the relationship by looking at the maxOccurs property and the model groups involved in the items mapped to the same rowSet. Unlike other decomposition solutions, due to this algorithm, there are no constraints that allow only sibling or child elements/attributes to be mapped to the same table. As long as all the elements/attributes involved in mapping to the same rowSet forms a legal one-to-one or one-to-many relationship, the elements/attributes mapped to the same table can be from any part of the XML Schema. The new algorithm, by way of detecting a one-to-many relationship, allows multi-valued dependencies to be decomposed into relational tables. The algorithm not only creates rows based on the cardinality of elements/attributes involved, but also creates rows based on the type of the model group.

The new XML decomposition also has a strong type validator and type conversion engine. It will disallow, during registration, elements/attributes from being decomposed into target columns if the declared schema type is deemed incompatible with the corresponding target columns type. However, it allows for type conversion if possible. For instance, an element/attribute of type integer can be mapped to a column of type character.

Due to the new architecture and algorithm, early performance comparisons indicate that new annotated schema XML decomposition is much faster than XML Extender Decomposition.


Default mapping generator tool

The problem of mapping existing XML schema to an existing relational schema is much harder than creating a default relational schema based on the XML schema and then decomposing corresponding XML documents into it. Decomposition of XML documents into a specially created relational schema is only a special case of the harder and more generic problem that the new feature claims to solve. However, with a new technology like this, it is useful to have default mapping. This not only helps users familiarize with the new mapping constructs but also provides a platform to further customize the applications based on their needs. This tool can create a relational schema and insert annotations in the input XML schema accordingly. In production environments, default mapping almost always do not suffice; some degree of tuning of the mapping is usually required. The motive behind keeping the default mapping logic outside of annotated XML schema decomposition allows users to build upon the default mapping and fine-tune it based on their requirements. Users are free to add any of the annotation constructs introduced in the table above on top of the default mapping annotations inserted in the XML schema. The tool described in this section allows users to generate default mapping and relational schema for any given XML Schema.

The default mapping generator utility allows users to decompose XML documents without having to write a single line of annotation in the XML schema. The tool requires two inputs:

  1. The primary XML schema document: The XML schema document through which all XML schema documents can be reached by exploring the import/include/redefine constructs
  2. The default relational schema for all the tables involved

It then produces the following two outputs:

  1. It prints out the DDLs for the creation of relational schema onto the standard output
  2. It inserts the required annotations in the XML schema so as to accomplish the mapping of all elements/attributes into the relational schema returned as output

The tool starts at the global element declaration and digs through the hierarchy to annotate each element/attribute on its way; the same is done for each global element declaration. As it traverses through the XML schema, it also creates the DDLs for the relational schema. The name of each complexType element that it encounters during traversal is treated as a new table name. The names of child attributes and elements of simpleType are the names of the columns of that table. The tool also inserts annotations in the element/attribute declaration in XML Schema to map it to the corresponding table-column pair. Another important decision to make during the generation of DDLs is the selection of data type for the columns. The tool makes a default choice. The following table indicates the DB2 type that will be selected for a column when an XML schema type is encountered.

Sno. XML Schema element/attribute type DB2 target column type

anyType

VARCHAR(20)

anySimpleType

VARCHAR(20)

Duration

INTEGER

dateTime

TIMESTAMP*

time

TIME*

date

DATE*

gMonth

INTEGER

gYear

INTEGER

gYearMonth

INTEGER

boolean

CHAR(5)

base64Binary

BLOB(1M)

hexBinary

BLOB(1M)

anyURI

VARCHAR(128)

QName

VARCHAR(256)

NOTATION

VARCHAR(256)

float

DOUBLE

double

DOUBLE

decimal

DOUBLE

integer

BIGINT

long

BIGINT

int

INTEGER

short

SMALLINT

byte

SMALLINT

nonPositiveInteger

BIGINT

negativeInteger

BIGINT

nonNegativeInteger

BIGINT

positiveInteger

BIGINT

unsignedLong

BIGINT

unsignedInt

BIGINT

unsignedShort

INTEGER

unsignedByte

SHORT

string

VARCHAR(128)

normalizedString

VARCHAR(128)

token

VARCHAR(20)

language

VARCHAR(20)

Name

VARCHAR(20)

NCName

VARCHAR(20)

ID

VARCHAR(20)

IDREF

VARCHAR(20)

IDREFS

VARCHAR(20)

ENTITY

VARCHAR(20)

ENTITIES

VARCHAR(128)

NMTOKEN

VARCHAR(20)

NMTOKENS

VARCHAR(128)

union

VARCHAR(128)

list

VARCHAR(128)

Note: * indicates that the db2-xdb:truncate annotation is used for these datatypes. This allows the time zone information to be truncated, as DB2 data types do not support time zones.

This is, of course, the table of default mapping of data types. You can change this by either modifying the DDL statements produced or, more permanently, by modifying the array XSD2DB2[] in file TableCreator.java in the attached source.

Usage:

DefaultAnnotation primary_schema_doc.xsd defaultSQLSchemaName

This utility takes two inputs -- the primary XML Schema Document and the default SQL Schema in which the tables reside. Based on the algorithm described above, the tool will create and print DDLs on the standard output device and insert corresponding annotations to map the element/attribute to the table-column. The annotated XML schema can then be registered with XSR and enabled for decomposition. You can then use created XSRObject to decompose XML documents.

Parameters

The XML schema: The primary XML schema document through which all related schema documents can be explored. Users are advised to keep a separate copy of their schema documents before running this tool since this tool will modify the primary schema document and any document that is part of the schema that is referenced using xsd:include, xsd:import, or xsd:redefine.

The second parameter is used to qualify all table references.

Note: Running the tool multiple times on the same schema documents will result in error during decomposition enablement in XML Schema Repository, as on each run the tool will add the same set of mappings to the same set of element/attributes.

Tool prerequisites

  1. Java™ Runtime Environment (JRE) 1.4.2 or JRE 1.5
  2. Java Archive (JAR) files for Eclipse Modeling Framework
  3. JARs for XSD plugin
  4. JARS for xml4j-4_2_2 or equivalent Xerces-J.

The links to obtain these tools are listed in Resources.

How to run it

Being written in Java, this tool will work on all platforms supported by JRE. However, .bat files are provided for convenience to be used on Windows® platform. The .bat files set the PATH and CLASSPATH variables. You should set the paths of the JARs in CLASSPATH and the JDK in accordance with their installation. There are two flavors of the .bat file, namely, DefAS_Eclipse.bat and DefAS _DB2DWB.bat. The file DefAS_Eclipse.bat is customized for users that download Eclipse and related plug-ins directly from www.eclipse.org, and DefAS_DB2DWB.bat is for users that have DB2 Developer Workbench installed. Note that if DB2 Developer Workbench is installed on the same machine, all the required plug-ins will come with it automatically.


Sample

A sample is also provided in the attached file in the Download section. Once you have unzipped the download, the sample can be found in the sample sub-directory. The sample has the original XML schema, the converted annotated XML schema, and the DDL containing the new relational schema that was produced using this tool.

The original XSD file is Order.xsd.

The annotated XML schema that is create using this tool is Order_AS.xsd.

The new relational schema is Order.sql.

Note that after inserting annotations in Order_AS.xsd, it has been run through a pretty-print XML tool for convenience.

Limitations of the tool

  1. The default mapping may result in loss of information related to association between entities. For instance, if a complexType element PurchaseOrder has a child element Customer of complexType, then the purchase order related information, such as POID, Date, and so on, will be decomposed into the PurchaseOrder table and Customer related information, such as CustID, CustName, and so on, will be stored in the Customers table. But the fact that a certain customer was referenced in context of the purchase order will be lost. However, this limitation can be overcome by manually inserting the annotations. In this case, the association information can be recorded by creating another table that stores the association PO_Cust. Then add annotations on the POID element or attribute and to CustID element/attribute, so as to map them to the table PO_Cust.
  2. The tool does not support annotation of recursive XML schemas. In such cases, you may get an error from JRE indicating that the process has run out of stack space.
  3. The tool does not use many of the annotation constructs.

Steps to run the sample

  1. Run the tool on any XML schema and capture the DDLs written to the standard output.
  2. Run the DDLs to create the new relational schema.
  3. Register the XML schema and enable it for decomposition using the following CLP commands:

    REGISTER XML SCHEMA order.xsd from
    c:\xsd\order.xsd AS po.order COMPLETE ENABLE DECOMPOSITION

  4. If step 3 goes through successfully, then decompose the XML document using the following command:

    DECOMPOSE XML DOCUMENT c:\xml\order.xml
    XMLSCHEMA po.order VALIDATE

    Or call the stored procedure to decompose the XML document. The stored procedure and its description is as follows:

    xdbDecompXML ( rschema , xmlschemaname, xmldoc , documentid , validation ,
    reserved , reserved ,reserved )

    rschema : The relational schema within which the XSRObject is created. The XSRObject is created as a consequence of registering and completing the XML Schema, which is annotated for decomposition in this case.

    xmlschemaname : The name of the XSRObject, which contains the XML schema that describes, through annotations, how the XML document should be decomposed.

    xmldoc : This parameter is of type BLOB that is used to pass the XML document to be decomposed.

    documentid : An identifier for the XML document. This is used to report diagnostic information in the db2diag.log and is also returned as part of the error message.

    validation : Should be set to one if the XML document should be validated as it is being shred. It should be zero otherwise.

    reserved : All the reserved parameters should be passed in as null.

There are five variations of this stored procedure based on the size of the XML document to be decomposed. All of them have the same parameters, except for the size of BLOB used to pass in the XML document. Table 2 describes the different variations of the stored procedure:


Table 2. Different variations of the stored procedure
Stored procedure name Max document size that can be decomposed

SYSPROC.XDBDECOMPXML

1MB

SYSPROC.XDBDECOMPXML10MB

10MB

SYSPROC.XDBDECOMPXML25MB

25MB

SYSPROC.XDBDECOMPXML50MB

50MB

SYSPROC.XDBDECOMPXML75MB

75MB

SYSPROC.XDBDECOMPXML100MB

100MB



Note that the annotated XML schema stored procedures belong to the SYSPROC schema and are built in. Its use does not require enabling of the database, as in case of XML Extender.

These four steps can be used as a template to decompose documents. The only difference that may occur is when an XML schema consists of multiple XML schema documents, as all the schema documents need to be registered before completing the XML schema.


Conclusion

Annotated XML schema decomposition is a new feature being introduced in the DB2 Viper release. It has been written with new algorithms and introduces a new mapping language. The mapping language and algorithms together provide a highly flexible, efficient, and methodical way of decomposing XML documents. It allows XML schemas that differ significantly in shape from the target relational schema to be decomposed. The tool, DefaultAnnotater, allows you to get quickly started with the new function by creating a default relational schema and inserting corresponding annotations in the XML Schema. The tool also helps you quickly learn the various constructs on Annotated XML Schema Decomposition.



Download

DescriptionNameSizeDownload method
Source and sample for DefaultAnnotater toolDefaultAnnotater.zip25KBHTTP

Information about download methods


Resources

Learn

Get products and technologies

  • Download the Eclipse SDK.

  • Download Xerces-J.

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.

Discuss

About the author

Mayank Pradhan is a staff software engineer at IBM, working in the XML Technology group. He is currently part of the development team for the annotated XML schema decomposition function being introduced in DB2 Viper. Previously, he was also part of the XML Extender development team.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=108746
ArticleTitle=Default mapping for annotated XML schema
publish-date=04202006
author1-email=mayankpr@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers