Introduction to annotated XML schema decomposition
The annotated XML schema decomposition feature introduced in DB2 Viper can be used to decompose XML documents, wholly or partially, into relational tables. It uses annotations in XML schema as the mapping language to map information in an XML document to relational tables. The new XML decomposition feature uses XML Schema as the platform to obtain mapping information. Since, XML Schema is an open standard, it only requires the addition of a few proprietary features in the mapping language specification. This allows for a shorter learning curve as opposed to an equivalent mapping specification in a completely proprietary language. Note that annotations added to the XML Schema do not participate in validation of corresponding XML documents. This allows users to use the same XML schema for mapping and validation of XML documents.
The new XML decomposition feature requires that the annotated XML schema be registered in the XML Schema Repository. XML Schema Repository is also a new feature introduced in DB2 Viper. As the name suggests, it is a repository for XML schemas that may consist of one or more XML Schema documents. The scope of XML schemas registered in XSR is limited to the database. Any XML schema that is registered with XSR can be used for two purposes:
- Validation of XML documents as they are inserted into XML type columns
- Validation of XML documents as they are being decomposed into relational tables
Any XML schema registered in XSR can be enabled to be used with XML decomposition if the XML schema has at least one decomposition-related annotation present. When an XML schema is enabled for decomposition, checks are made to ensure the correctness of the annotations, compatibility of XML Schema types to DB2 column type and existence of relational tables and columns specified in annotations. If the annotated XML schema is deemed valid, mapping information is extracted and stored in catalog tables in a ready-to-use binary format.
Finally, any XML document that conforms to the annotated XML schema can now be decomposed into relational tables as per the specified mapping.
The annotated XML schema mapping language is designed with the aim to serve users who already have an existing relational schema with several applications and now want to consume data from XML documents into the existing relational schema. The structure of the XML documents, as expressed by an XML schema, could be very different from the relational schema, as the users may have less or no control of the design of XML Schema. Consequently, the mapping language is designed to provide greater flexibility and granular control over the entire process of decomposition. There are 11 different mapping constructs (some of them even have multiple enumerated values) to express mapping from elements and attributes declared in XML Schema to table-column pairs in the relational schema. Each of these constructs is designed to primarily serve a well-defined purpose. Most of these constructs are a result of feature requests that came from XML Extender customers. Table 1 lists these constructs and their primary goal:
Table 1. Mapping constructs and their primary goals
|SNo.||Annotation construct||Purpose||Analogous DAD construct||Brief description|
Mapping XML elements/attributes to table-column pairs
Can be used to specify the logical or physical container (a physical container can be a relational table) of a logically associated set of rows.
This is used to specify the table to which the element/attribute is to be mapped.
Must be used to specify the column into which the information associated with element/attribute should be decomposed into.
This is used in conjunction with the abovetableconstruct in DAD to specify the table-column pair to which the element/attribute is mapped.
Should be used to specify the SQL schema of the relational tables involved.
The tables need to be qualified either explicitly, for every usage, or they will be automatically qualified by the user invoking the shred stored procedure.
Groups all associated annotation constructs that specify a single mapping to a rowSet-column pair. One or more db2-xdb:rowSetMapping constructs is allowed on the same element/attribute declaration that maps to a different rowset-column pair.
Allows a table-column pair mapping to be grouped. Only one such mapping is permitted on any element/attribute node.
This annotation enables users to map multiple items from various parts of the XML document to be decomposed into the same table-column pair as different rows.
Such an operation cannot be done using any of the DAD constructs.
Allows users to specify different targets (tables) based on the context in which the element/attribute appears. This permits users to annotate elements and attributes declared in reusable types and groups to be decomposed based on the context.
The locationPath is implicit in the DAD itself, as the entire XML hierarchy needs to be repeated using the element_node/attribute_node construct.
Allows users to filter the data to be inserted based on the content of element/attribute in the XML document.
This construct is overloaded in DAD. It has the following 2 purposes, depending on where it is specified.
Control on data to be decomposed
Allows users to select the content of an element from the following choices:
This cannot be done in the shredding framework of XML Extender.
Transformation of data to be decomposed
Enables users to specify the normalization behavior for the content of the XML element or attribute that is being mapped to a character target column before the content is inserted. The following choices are provided:
XML Extender shred does not perform any kind of normalization before insertion into the specified table-column.
Allows users to specify a customized expression, the result of which is inserted into the table this element is mapped to.
User specified data transformation cannot be done during shredding with XML Extender framework. If such an operation is required, either the users have to transform their XML document, using XSL, before shredding it or write triggers to modify the data before it is inserted.
Allows users to specify whether truncation is permitted when an XML value is inserted into a character target column.
This operation is also not permitted in XML Extender.
Note: db2-xdb is the prefix used for all our annotations. Users can use any prefix they choose; however, the prefix must be bound to the namespace URI http://www.ibm.com/xmlns/prod/db2/xdb1, otherwise the annotations will be ignored by decomposition registration and runtime engine.
XML Extender shredding framework does not support element/attributes to be in any namespace.
For convenience, all of the listed annotations can be specified as non-native attributes of the element/attribute declarations or as a child of xdb:annotation/xdb:appinfo hierarchy using the db2-xdb:rowSetMapping construct. For more details on each of these annotations, refer to the DB2 documentation.
Besides the various mapping constructs introduced in annotated XML schema decomposition, it also boasts of a new algorithm that can determine the one-to-one or one-to-many relationships without the introduction of any explicit (user-specified) mapping construct. The algorithm can determine the relationship by looking at the maxOccurs property and the model groups involved in the items mapped to the same rowSet. Unlike other decomposition solutions, due to this algorithm, there are no constraints that allow only sibling or child elements/attributes to be mapped to the same table. As long as all the elements/attributes involved in mapping to the same rowSet forms a legal one-to-one or one-to-many relationship, the elements/attributes mapped to the same table can be from any part of the XML schema. The new algorithm, by way of detecting a one-to-many relationship, allows multi-valued dependencies to be decomposed into relational tables. The algorithm not only creates rows based on the cardinality of elements/attributes involved, but also creates rows based on the type of the model group.
The new XML decomposition also has a strong type validator and type conversion engine. It will disallow, during registration, elements/attributes from being decomposed into target columns if the declared schema type is deemed incompatible with the corresponding target columns type. However, it allows for type conversion if possible. For instance an element/attribute of type integer can be mapped to a column of type character.
Due to the new architecture and algorithm, early performance comparisons indicate that the new annotated schema XML decomposition is much faster than XML Extender Decomposition.
Following are the primary migration considerations when migrating from XML Extender Decomposition to annotated XML schema decomposition:
- Migration of mapping document: Users need to migrate their DAD to annotated XML schema in order to use the new functions. This is perhaps the biggest part of the migration. However, there is help in the form of a tool that is described below. The tool will help users convert their XML schemas to annotated XML schema documents based on the DAD.
- Addition of new registration step: Users now need to register the annotated XML schema in the XML Schema Repository. This is analogous to the step of enable_collection in XML Extender. But since this was an optional step in XML Extender, some users might not have a collection at all. However, when using annotated XML schema decomposition, it is mandatory to register the XML schema in the XML Schema Repository and enable it for decomposition prior to decomposing any XML documents against it.
- Migration of the
stored procedure call: The new stored procedure call
looks as follows:
xdbDecompXML ( rschema , xmlschemaname , xmldoc , documentid , validation , reserved , reserved , reserved )
rschema: The relational schema within which the XSRObject is created. The XSRObject is created as a consequence of registering and completing the XML schema, which is annotated for decomposition in this case.
xmlschemaname: The name of the XSRObject, which contains the XML schema that describes, through annotations, how the XML document should be decomposed.
xmldoc: This parameter is of type BLOB that is used to pass the XML document to be decomposed.
documentid: An identifier for the XML document. This is used to report diagnostic information in the db2diag.log and is also returned as part of the error message.
validation: Should be set to one if the XML document should be validated as it is being shred. It should be set to zero otherwise.
reserved: All the reserved parameters should be passed in as null.
There are five variations of this stored procedure based on the size of the XML document to be decomposed. All of them have the same parameters, except for the size of BLOB used to pass in the XML document. Table 2 describes the different variations of the stored procedure:
Table 2. Different variations of the stored procedure
|Stored procedure name||Corresponding XML Extender stored procedure||Max document size that can be decomposed|
Note that the annotated XML schema stored procedures belong to the SYSPROC schema and are built in. Its use does not require enabling of the database, as in the case of XML Extender.
- Error handling: The new stored procedure now returns SQLCA with proper SQLCODE and SQLSTATE that is more inline with the error handling framework when using SQL. This allows the applications to use existing infrastructure, if available, to handle SQL errors. The SQLCA can be used to format and build localized messages.
Note that like all new XML related features introduced in the DB2 Viper release, the new annotated XML schema decomposition is also supported only on Unicode databases and on single-partitioned databases.
XML schema converter tool
The DAD to annotated XML schema converter utility helps users convert their XML schemas to annotated XML schemas based on the mapping rules described in the DAD. The XML schema and DAD must be describing the same set of XML documents, although it is possible and acceptable that the XML schema may describe a superset of documents described by the DAD. Users that do not have an XML schema can easily generate XML schemas from corresponding DTDs and even XML documents using tools available freely on the internet. DAD to annotated XML schema converter tool can then take the XML schema and the DAD as input to produce an annotated XML schema. The tool supports import, include, and redefine construct of the XML schema. (In other words, if an XML schema is spread across many XML schema documents through import, include, or redefine, only the path for the primary schema document -- the document through which all XML schema documents can be reached through either import, include, or redefine -- is needed.) The tool will annotate element/attribute declarations across schema documents. Note that since DADs do not support namespaces, it is impossible to have the use of import construct in this scenario.
Usage: DAD2AS dad_filename.dad primary_schema_doc.xsd defaultSQLSchemaName
This utility takes three inputs -- a DAD file, the primary XML schema document, and the default SQL schema in which the tables reside. Based on the mapping information specified in the DAD, it converts the specified XML schema into an annotated XML schema, which can be used with the annotated XML schema decomposition feature of DB2 Viper.
The specified (RDB_NODE) DAD, which may currently be in use with XML Extender shredding, is assumed to be a valid DAD. It provides the mapping information to this tool. The utility will read the DAD file, infer the mapping, and apply the same mapping to the appropriate elements/attributes in the XML schema provided as the second parameter.
The XML schema must be in accordance with the DAD; there should not be an element/attribute used in any context in the DAD that is not defined in the same context in the XML schema. In other words, the DAD and XML schema must be such that they work on the same family of XML documents. (If the users do not have an XML schema but have a DTD, they can use various tools to convert a DTD to XML schema.) The tool assumes that none of the elements/attributes defined in the XML schema will be in any namespace, as the namespace qualification of elements/attributes is not supported in DAD either.
Users are advised to keep a copy of their schema documents before running this tool since this tool will modify the primary schema document and any document that is part of the schema that is referenced using xsd:include, xsd:import, or xsd:redefine.
The third parameter is used to qualify any unqualified table references. If a reference to a table is not qualified in the DAD, it will be assumed to reside in the specified default SQL schema. However, this parameter is required even if all the tables in the DAD are qualified.
Note: Running the tool multiple times on the same schema documents will result in error during decomposition enablement in XSR, as on each run the tool will add the same set of mappings to the same set of element/attributes.
- JRE 1.4.2 or JRE 1.5
- JARs for Eclipse Modeling Framework
- JARs for XSD plug-in
- JARS for xml4j-4_2_2 or equivalent Xerces-J.
The links to download these tools are listed below.
How to run it
Being written in Java™, this tool will work on all platforms supported by JRE. However, .bat files are provided for convenience to be used on Windows®. The .bat files sets the PATH and CLASSPATH variables. Users should set the paths of the JARs in CLASSPATH and the JDK in accordance with their installation. There are two flavors of the .bat file, namely, Dad2AS_Eclipse.bat and Dad2AS_DB2DWB.bat. The file Dad2AS_Eclipse.bat is customized for users that download Eclipse and related plug-ins directly from www.eclipse.org, and Dad2AS_DB2DWB.bat is for users that have downloaded DB2 Developer Workbench. Note that if DB2 Developer Workbench is downloaded, all the required plug-ins will come with it automatically.
A sample is also provided in the attached file in the Download section. Once you have unzipped the download, the sample can be found in the sample sub-directory. The sample has a DAD file, the original XML schema, and the converted annotated XML schema that was produced using this tool.
The DAD file is Order.dad.
The original XSD file is Order.xsd
The annotated XML schema that is create using this tool is Order_AS.xsd
Note that after inserting annotations in Order_AS.xsd, it has been run through a pretty-print XML tool for convenience.
Limitations of the tool
- Does not add db2-xdb:condition annotation corresponding to element/attribute declaration for a condition specified on a non-root element in the DAD using the <condition> construct in DAD. This can either be added manually, or the source can be modified to accomplish this as well.
Annotated XML schema decomposition is a new feature introduced in DB2 Viper. It has been written from scratch with new algorithms and mapping language. The mapping language and algorithms together provide a highly flexible, efficient, and methodical way of decomposing XML documents. It allows XML schemas that differ significantly in shape from the target relational schema to be decomposed. Early performance comparisons already show that the new annotated XML schema decomposition feature is much faster in running time than the XML Extender Decomposition. In light of all the performance benefits, flexibility and predictability of results offered by this new feature, it will be beneficial to migrate to annotated XML schema decomposition in DB2 Viper. The DAD to annotated XML schema generator tool will help users in migration by converting the DAD to an equivalent annotated XML schema, thus making the task of migration much easier.
|DAD to annotated XML schema converter tool||DAD2AS.zip||30KB|
- developerWorks DB2 UDB page: Learn more about DB2 Viper.
- Information management and XML technology page: Find articles and resources for implementing XML technology with DB2.
- developerWorks Information Management zone: Find more resources for DB2 UDB developers and administrators.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- Download the DB2 Viper to test XML schema decomposition for yourself.
- Download the Eclipse SDK.
- Download Xerces-J.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- developerWorks blogs: Get involved in the developerWorks community.
Dig deeper into Information management on developerWorks
Experiment with new directions in software development.
Read and subscribe for the best and latest technical info to help you deal with your development challenges.
Software development in the cloud. Register today and get free private projects through 2014.
Evaluate IBM software and solutions, and transform challenges into opportunities.