XML data integrity in a DB2 database can be viewed in several ways. At a very basic level, before inserting an XML value into a table, you may want to make sure that the data type of the various element values are correct. At a deeper level, you may want to have constraints such as uniqueness of an element value, multiplicity of an element, or restriction of the values of an element. When XML data is stored along with relational data, you may want to ensure that some of the element values in the XML also go to relational columns. The reason for this might be that these columns may be a part of the primary key or foreign key.
DB2 provides a number of different solutions to these integrity requirements. Although you have an option of validating the XML document against a schema before insertion, triggers can be used to do this validation automatically without the knowledge of the user. You can use check constraints to force the user to validate the XML value with a pre-registered schema. Before you use any of these options, an XML schema needs to be registered with the database. To store a node value into a relational column to maintain uniqueness, you can shred the appropriate node value into a relational data type using an SQL/XML function and insert it into a relational column. In the following sections, we will talk about all these options in detail with code examples.
XML schema registration and validation
An XML schema is used to define the structure of an XML document, its elements, its attributes, and data types of elements or attributes. Furthermore, it allows you to specify default element values and restrictions on elements such as multiplicity or uniqueness. Once defined,it can be used for validating XML documents against the XML Schema specified. Listing 1 shows an example of an "employee" schema. We will be using this schema in later parts of this article.
Listing 1. Employee schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name = "employee">
<xs:complexType>
<xs:sequence>
<xs:element name = "name" type = "xs:string"/>
<xs:element name = "dept" type = "xs:string"/>
<xs:element name = "salary" type = "xs:double"/>
<xs:element name = "designation" type = "xs:string"/>
<xs:element name = "date-of-join" type = "xs:date"/>
<xs:element name = "address">
<xs:complexType>
<xs:sequence>
<xs:element name = "street" type = "xs:string"/>
<xs:element name = "city" type = "xs:string"/>
<xs:element name = "state" type = "xs:string"/>
<xs:element name = "country" type = "xs:string"/>
<xs:element name = "zipcode" type = "xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name = "id" type = "xs:string" use = "required"/>
</xs:complexType>
</xs:element>
</xs:schema>
|
An XML schema should be registered in the DB2 database before using the schema for validation. Listing 2 shows the CLP commands to register the employee schema, assuming that it is stored in a file employee.xsd in your working directory.
Listing 2. Registering the XML schema
REGISTER XMLSCHEMA http://posample.org FROM employee.xsd as employee; COMPLETE XMLSCHEMA employee; |
Once the schema is registered successfully and given a relational name, you may use it for validation.
Validation is a process of checking whether the instance document satisfies structural, data type, and content constraints specified in the XML schema. Validation eliminates invalid data before being inserted into the database. It ensures that XML documents conform to the rules provided by XML schemas, as well as being well-formed.
In DB2, validating XML documents against XML schemas is optional. For example, if XML documents are received from a trusted source, that validation may not be needed. On the contrary, if XML documents are received from an unknown source, be careful to validate XML documents to maintain data integrity.
The XMLVALIDATE function is used to validate XML documents before inserting or updating XML documents into a table. As mentioned previously, XML schemas should be registered before using them for validation.
The XMLVALIDATE function validates documents either implicitly or explicitly.
-
Implicit validation: The XML document specifies the schema location and namespace using the xsi:schemaLocation attribute or specifies only schema
location
using xsi:noNamespaceSchemaLocation attribute. DB2 takes these input values and searches the catalog tables to identify XML schema needed to validate
XML document.
For example, you could insert the following XML documents into an existing table named EMPLOYEE having two columns, one of type integer and the second
of type XML.
Listing 3. Implicit validationINSERT INTO employee VALUES (101, '<employee xmlns:xsi="http://www/w3.org/2001/XMLSchema" xsi:noNamespaceSchemaLocation="employee.xsd" id = "101"> <name>Sonal</name> <dept>Sales</dept> <salary>10000</salary> <designation>Sales Executive</designation> <date-of-join>2007-10-10</date-of-join> <address> <street>M.G. Road</street> <city>Bangalore</city> <state>Karnataka</state> <country>India</country> <zipcode>560001</zipcode> </address> </employee>')
- Explicit validation: The XML document explicitly specifies the SQL identifier or namespace and schema location information of the
XML schema using the ACCODING TO XMLSCHEMA ID clause in the XMLVALIDATE function.
The example below shows explicit validation.
Listing 4. Explicit validationINSERT INTO employee VALUES (101, XMLVALIDATE(XMLPARSE(document '<employee id="101"> <name>Sonal</name> <dept>Sales</dept> <salary>10000</salary> <designation>Sales Executive</designation> <date-of-join>2007-10-10</date-of-join> <address> <street>M.G. Road</street> <city>Bangalore</city> <state>Karnataka</state> <country>India</country> <zipcode>560001</zipcode> </address> </employee>') ACCORDING TO XMLSCHEMA ID employee))
In DB2, XML document validation is done on a per document basis, not at column level. A table having an XML column need not have all the documents pertaining to one single XML schema. It can have XML documents that can pertain to different schemas. You can check whether documents are validated or not using the VALIDATED predicate.
Validation through check constraints on XML columns
Consider a scenario where you want to bind a set of XML schemas with an XML column. However the responsibility of validating the XML value lies with the application. In this case, the DBA can put a check constraint on the XML column and ensure that the application uses one of the pre-registered schemas for validation. This will also ensure that the XML document always gets validated.
Check constraints are constraints on table columns that are used to restrict the user to a particular check condition. While performing inserts or updates on a column, only when the check condition evaluates to true can you perform the operation on the table.
Creating a table with check constraint on an XML column
Check constraints on XML give you the ability to specify constraints on an XML column in terms of XML schema validation, to ensure that only validated XML documents with correct schemas are stored in the column. This ensures the consistency of the information stored in XML columns. It also gives the ability to define XML expressions on the XML values as they are validated according to a pre-defined set of schemas. The IS VALIDATED or IS NOT VALIDATED predicates with the ACCORDING TO XMLSCHEMA clause can be used to verify whether the document is validated or not validated against the specified schemas.
Check constraints can also list a set of schemas. This means the XML document can be validated against one of the XML schemas specified. By providing multiple schemas in the check constraint, you gain the flexibilty of using one of the schemas for validation to help in associating multiple schemas with a particular XML column. The example in Listing 5 shows how a single schema can be specified in a check constraint.
Listing 5. Create check constraint on an XML column
CREATE TABLE customer (cust_id INTEGER NOT NULL,
info XML CONSTRAINT val_check
CHECK(info IS VALIDATED ACCORDING TO XMLSCHEMA ID customer))
|
The above example creates a check constraint on the column INFO which is of type XML. The check constraint ensures that the document is validated according to customer schema (assuming that the customer schema is registered in DB2). Note that the check constraint evaluates whether the XML document is validated or not. It does not perform the actual validation.
The example in Listing 6 shows how multiple schemas can be listed while creating one particular check constraint.
Listing 6. Create check constraint on an XML column with multiple schemas
CREATE TABLE customer (cust_id INTEGER NOT NULL,
cust_info XML CONSTRAINT val_check
CHECK (cust_info IS VALIDATED ACCORDING TO XMLSCHEMA IN
(ID registered_customers, ID online_customers, ID regular_customers)))
|
The insert statement in the following example fails because it validates the XML document according to a schema not specified in the check constraint clause.
Listing 7. check constraint violation
INSERT INTO customer VALUES (1001, XMLVALIDATE(XMLPARSE(cust_info) ACCORDING TO XMLSCHEMA ID cust)) |
Inserting XML values through views
XML documents can also be inserted through views into their respective base tables. When an insert is performed on an XML column through a view, the check constraint of the base table is applied on the XML document, and only when it satisfies the check constraint does the XML document gets inserted into the table. The example below creates a view on two tables having XML documents.
Listing 8. Insert XML documents through views
CREATE TABLE distributor (dist_id INTEGER, dist_info XML CONSTRAINT dist_check CHECK(dist_info IS VALIDATED ACCORDING TO XMLSCHEMA ID distributor)) CREATE VIEW cust_dist_view(id, details) AS (SELECT id, info FROM customer UNION ALL SELECT dist_id, dist_info FROM distributor) INSERT INTO cust_dist_view VALUES (id, XMLVALIDATE(XMLPARSE(details) ACCORDING TO XMLSCHEMA ID distributor)) INSERT INTO cust_dist_view VALUES (id, XMLVALIDATE(XMLPARSE(details) ACCORDING TO XMLSCHEMA ID registered_customers)) |
The CREATE VIEW statement creates a view of the customer and distributor tables. The first insert statement inserts data into DISTRIBUTOR table if the XML data stored in DETAILS validates against the XML schema DISTRIBUTOR. This second insert statement inserts data into CUSTOMER table if the XML data stored in DETAILS validates against the XML schema REGISTERED_CUSTOMERS.
Check constraint and view dependency on XML schema
Check constraints and views created on top of a table with check constraints on an XML column are dependent on XML schemas. If the schema is dropped, the check constraint cannot check the validation against the dropped schema and hence the insert succeeds.
Listing 9. Check constraint and view dependency on XML schema
DROP XSROBJECT customer DROP VIEW cust_dist_view INSERT INTO customer VALUES(201, cust_info) INSERT INTO cust_dist_view VALUES (202, details); |
The insert operations on CUSTOMER table and CUST_DIST_VIEW will succeed without any validation.
Enforcing automatic validation through triggers
In a situation where an application is developed in such a way that it is difficult to change the insert statements to do an explicit validation, the DBA can write a trigger which will be activated on every insert or update operation to do the validation automatically.
DB2 9.5 introduced a concept of enforced automatic validation of incoming XML documents using BEFORE triggers. Whenever an insert or update operation is performed on a table, automatic validation ensures that XML documents get validated against the specified XML schema. This automatic validation is done in the action part of the trigger statement using the XMLVALIDATE function.
Creating a BEFORE trigger statement to enforce automatic validation
Whenever an XML document is inserted or updated in a table, the trigger gets executed. The action part of the trigger statement includes the XMLVALIDATE function to validate the XML documents against the XML schema specified.
For example, an employee table contains employee ID and employee information. Employee ID column EMP_ID is of type integer, and employee information column INFO is of type XML. The INFO column contains employee information such as name, designation, salary, date of joining, department and contact details represented in an XML document. Automatic validation on incoming XML documents can be enforced as follows.
Listing 10. Validate employee information using BEFORE trigger statement
CREATE TABLE employee (emp_id INTEGER, info XML); CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT on employee REFERENCING NEW AS n FOR EACH ROW MODE DB2 SQL BEGIN ATOMIC set (n.info) = XMLVALIDATE (n.info ACCORDING TO XMLSCHEMA ID employee); END |
The XMLVALIDATE function in the action part of trigger statement validates the employee information specified in the INFO column. The ACCORDING TO XMLSCHEMA clause is used to validate INFO against the employee XML schema. When an XML document is inserted into the EMPLOYEE table, a BEFORE INSERT trigger statement is executed and the XML document is validated against the employee schema. Only if validation succeeds will the record be inserted into the table.
Note: If the trigger fails to validate the XML document, the document does not get inserted. It does not return any error message.
Using IS {NOT} VALIDATED predicates in WHEN clause
There is always a possibility that the application validates the XML document before inserting it. However, there can also be a trigger defined to do the automatic validation. In such a situation, revalidating the XML document incurs extra overhead.
In this case, a trigger can be implemented to first verify that the validation has already been done. If so, it skips the validation process by using the WHEN clause in the trigger body. Only if the WHEN clause evaluates to true will the action part of trigger statement be executed. The following example demonstrates the use of the WHEN clause in the trigger.
Listing 11. Using a WHEN clause in a BEFORE UPDATE trigger statement
CREATE TRIGGER val_trig NO CASCADE BEFORE UPDATE ON employee REFERENCING NEW AS n FOR EACH ROW MODE DB2 SQL WHEN (n.info IS NOT VALIDATED ACCORDING TO XMLSCHEMA ID employee) BEGIN ATOMIC set (n.info) = XMLVALIDATE(n.info ACCORDING TO XMLSCHEMA ID employee); END |
The IS VALIDATED predicate returns true if the document is already validated against the specified schema and returns false otherwise. IS NOT VALIDATED predicate returns true if the XML document is not validated against specified schema. If the XML document is already validated, it returns false. Hence, if the document is already validated, it does not validate again. If the document is not validated, then it executes the action part of trigger statement and validates the XML document according to the specified schema.
In code listing 11, the WHEN clause checks to determine whether or not the XML document INFO is not validated against the EMPLOYEE schema. If it is true, then the INFO document gets validated against the EMPLOYEE schema in the trigger statement. Otherwise the action part of the trigger statement does not get executed.
Validating XML document against a schema located at a particular URI
We can validate XML documents against XML schema located at a particular URI. The ACCORDING TO XMLSCHEMA clause is used to validate the XML document INFO against the URI specified as part of XMLVALIDATE function. Listing 12 shows an example.
Listing 12. Validating employee information against a specified URI
CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT ON employee
REFERENCING NEW as n
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
set n.info = XMLVALIDATE(n.info ACCORDING TO XMLSCHEMA URI
'http://posample1.org');
END
|
The XMLVALIDATE function validates the employee information against the schema specified at URI http://posample1.org
Restriction of XML functions on transition variables
DB2 9.5 BEFORE triggers have a restriction against using functions on transition variables. If you use any other XML functions on transition variables other than XMLVALIDATE, the trigger creation will fail. For example:
Listing 13. Using XMLELEMENT function on a transition variable
CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT ON employee REFERENCING NEW AS n FOR EACH ROW MODE DB2SQL BEGIN ATOMIC set (n.id) = 1000; set (n.info) = XMLDOCUMENT(XMLELEMENT(name Russel, n.info)); END |
This CREATE TRIGGER statement will fail because the action part of the trigger is using the XMLELEMENT function on transition variable n.info.
Using IS VALIDATED and IS NOT VALIDATED predicates in SELECT statements
Validation is optional in DB2. A table can store a combination of validated and not validated XML documents in a single XML column. The IS VALIDATED predicate returns true if the XML document is already validated against any specific schema or against any registered schemas, else it will return false. The IS NOT VALIDATED predicate returns true if the XML document is not validated against specified schema or any of the registered schemas, or else it will return false. These predicates can also be used in the WHERE clause of the SELECT statement.
Listing 14. Using IS VALIDATED predicate
SELECT cust_id, cust_info FROM customer WHERE cust_info IS VALIDATED |
This statement will retrieve only those CUST_INFO documents from CUSTOMER table which are validated against any of the registered schemas. You can use the ACCORDING TO clause to retrieve only those XML documents which are validated against a specified schema as shown in Listing 15:
Listing 15. Using IS VALIDATED predicate with ACCORDING TO clause
SELECT cust_id, cust_info FROM customer WHERE cust_info IS VALIDATED ACCORDING TO XMLSCHEMA ID registered_customers |
This statement will retrieve only those CUST_INFO XML documents from CUSTOMER table which are validated against the schema REGISTERED_CUSTOMERS.
Listing 16. Using IS NOT VALIDATED predicate
SELECT cust_id, cust_info FROM customer WHERE info IS NOT VALIDATED |
This statement will retrieve CUST_INFO documents from the CUSTOMER table which are not validated against a registered XML schema. We can use the ACCORDING TO clause with the IS NOT VALIDATED predicate as shown in Listing 17:
Listing 17. Using the IS NOT VALIDATED predicate with the ACCORDING TO clause
SELECT cust_id, cust_info FROM customer WHERE info IS NOT VALIDATED ACCORDING TO XMLSCHEMA ID registered_customers |
This statement retrieves all CUST_INFO XML documents from the CUSTOMER table which are not validated against the XML schema REGISTERED_CUSTOMERS.
Shredding XML values to relational columns
Constraints such as primary key - foreign key relationships and uniqueness of values are not possible on particular values of an XML document. However if the application has such a requirement for a particular element in an XML document, it is possible to shred that value of the element into a relational column, and then apply the constraints accordingly. The code in Listing 18 selects the values from the EMPLOYEE table and insert it into a temporary table by shredding the values from the INFO XML column of the table EMPLOYEE.
Listing 18. Shredding an XML element value to a relational column using SQL/XML
INSERT INTO TEMP (SELECT id,info FROM
employee, xmltable('$INFO/employee' COLUMNS id INT PATH '@id') AS t)
|
However, this kind of shredding creates duplicate data. Therefore you need to take care to make sure data is consistent across multiple locations. Here, the ID value in the shredded relational column should always match with the value in the XML column. This kind of consistency can be maintained by creating a stored procedure for INSERT and UPDATE operations on this column. Whenever a user inserts an XML value, shredding can be used to get the data from the XML value and store it in the corresponding relational columns. Similarly, an UPDATE operation can update the relational value too. The opposite operations can either be restricted or can be done using similar stored procedures (update in the relational column updates the XML document too). Listing 19 shows an example of a stored procedure that shreds the data and then inserts the correct values in the columns.
Listing 19. The INSERT stored procedure
CREATE PROCEDURE INSERTEMPLOYEE (XML INFO)
P1: BEGIN
DECLARE P1 INTEGER;
SELECT ID INTO P1 FROM XMLTABLE('$INFO/employee' PASSING INFO AS "INFO"
COLUMNS ID INT PATH '@id') AS T;
INSERT INTO TEMP
VALUES (ID, INFO);
END P1
|
DB2 9 supports XML data in its native format and offers various options to query data. In this article, we looked at many new ways to make sure that XML data remains consistent when the data is queried, inserted, updated or deleted. The trigger support for XML allows the use of XMLVALIDATE function to implement automatic validation. You can use any of these options to make sure the data remains consistent and correct during its life cycle.
The authors would like to thank Susan Malaika and Iti Rawat for reviewing this article and giving us their valuable suggestions.
Learn
-
"Evolving your XML schemas using DB2 pureXML"
(developerWorks, Mar 2008): Handle your XML schema changes using DB2 XSR feature.
-
"
Overview of new DB2 Version 9.5 pureXML enhancements" (developerWorks, Nov 2007):
Learn about DB2 version 9.5 pureXML features and enhancements.
-
"
Shred XML documents using DB2 pureXML"(developerWorks, Jan 2008):
Learn about two methods for XML decomposition in DB2 for Linux, UNIX, and Windows.
- Browse the
technology bookstore
for books on these and other technical topics.
Get products and technologies
- Download
DB2 Express-C,
a no-charge version of DB2 Express database server for the community that includes pureXML.
- Download
a free trial version of DB2 for Linux, UNIX, and
Windows..
- Download
IBM product evaluation versions
and get your hands on application development tools and middleware products from
DB2, Lotus®, Rational®, Tivoli®, and
WebSphere®.
Discuss
- Participate in the discussion forum.
- Check out
developerWorks
blogs and
get involved in the
developerWorks community.

Manoj Sardana is a staff software engineer working with IBM India software labs. At present he works for the pureQuery development team. In the past, he developed sample applications for new features of DB2. He also worked for functional verification testing project for DB2. He is an IBM certified application developer and advanced database administrator for DB2 9. He is also IBM certified solution developer for XML and related technology.






