Skip to main content

New options for XML data integrity and validation in DB2 V9.5

Validation through CHECK constraint, trigger support

Manoj Sardana (msardana@in.ibm.com), Staff Software Engineer, IBM
Manoj Sardana
Manoj Sardana is a staff software engineer working with IBM India software labs. At present he works for the pureQuery development team. In the past, he developed sample applications for new features of DB2. He also worked for functional verification testing project for DB2. He is an IBM certified application developer and advanced database administrator for DB2 9. He is also IBM certified solution developer for XML and related technology.
Madhavi Kaza (madhkaza@in.ibm.com), Software Engineer, IBM
Madhavi Kaza photo
Madhavi Kaza works for the DB2 ID Samples and Technology demonstration team in the India Software Labs. She works on samples development for XML features. She has developed various XML samples for DB2 9 and 9.5 XML features and is now developing sample applications for DB2 9.7.

Summary:  IBM® DB2® for Linux®, UNIX®, and Windows® introduced pureXML® technology in Version 9. This technology allows you to store XML data in its native form, maintaining the hierarchical structure and permitting you to query it using SQL/XML and XQuery. Just as with relational data, the ability to ensure the integrity of XML data stored in DB2 is mandatory. The methods and mechanisms for ensuring the integrity of XML data, however, are a bit different than the methods for traditional relational data integrity. In this article, you'll learn about XML data integrity in the context of DB2, and explore the various options you have to guarantee integrity in different scenarios.

Date:  27 Aug 2009
Level:  Intermediate
Activity:  2878 views
Comments:  

XML integrity

XML data integrity in a DB2 database can be viewed in several ways. At a very basic level, before inserting an XML value into a table, you may want to make sure that the data type of the various element values are correct. At a deeper level, you may want to have constraints such as uniqueness of an element value, multiplicity of an element, or restriction of the values of an element. When XML data is stored along with relational data, you may want to ensure that some of the element values in the XML also go to relational columns. The reason for this might be that these columns may be a part of the primary key or foreign key.

DB2 provides a number of different solutions to these integrity requirements. Although you have an option of validating the XML document against a schema before insertion, triggers can be used to do this validation automatically without the knowledge of the user. You can use check constraints to force the user to validate the XML value with a pre-registered schema. Before you use any of these options, an XML schema needs to be registered with the database. To store a node value into a relational column to maintain uniqueness, you can shred the appropriate node value into a relational data type using an SQL/XML function and insert it into a relational column. In the following sections, we will talk about all these options in detail with code examples.


XML schema registration and validation

An XML schema is used to define the structure of an XML document, its elements, its attributes, and data types of elements or attributes. Furthermore, it allows you to specify default element values and restrictions on elements such as multiplicity or uniqueness. Once defined,it can be used for validating XML documents against the XML Schema specified. Listing 1 shows an example of an "employee" schema. We will be using this schema in later parts of this article.


Listing 1. Employee schema
				
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
		elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name = "employee">
    <xs:complexType>
        <xs:sequence>
            <xs:element name = "name" type = "xs:string"/>
            <xs:element name = "dept" type = "xs:string"/>
            <xs:element name = "salary" type = "xs:double"/>
            <xs:element name = "designation" type = "xs:string"/>
            <xs:element name = "date-of-join" type = "xs:date"/>
            <xs:element name = "address">
                 <xs:complexType>
                     <xs:sequence>
                            <xs:element name = "street" type = "xs:string"/>
                            <xs:element name = "city" type = "xs:string"/>
                            <xs:element name = "state" type = "xs:string"/>
                            <xs:element name = "country" type = "xs:string"/>
                            <xs:element name = "zipcode" type = "xs:string"/>
                     </xs:sequence>
                 </xs:complexType>
           </xs:element>
         </xs:sequence>
         <xs:attribute name = "id" type = "xs:string" use = "required"/>
      </xs:complexType>
   </xs:element>
</xs:schema>
		

An XML schema should be registered in the DB2 database before using the schema for validation. Listing 2 shows the CLP commands to register the employee schema, assuming that it is stored in a file employee.xsd in your working directory.


Listing 2. Registering the XML schema
				
REGISTER XMLSCHEMA http://posample.org FROM employee.xsd as employee;
COMPLETE XMLSCHEMA employee; 
			

Once the schema is registered successfully and given a relational name, you may use it for validation.

Validation is a process of checking whether the instance document satisfies structural, data type, and content constraints specified in the XML schema. Validation eliminates invalid data before being inserted into the database. It ensures that XML documents conform to the rules provided by XML schemas, as well as being well-formed.

In DB2, validating XML documents against XML schemas is optional. For example, if XML documents are received from a trusted source, that validation may not be needed. On the contrary, if XML documents are received from an unknown source, be careful to validate XML documents to maintain data integrity.

The XMLVALIDATE function is used to validate XML documents before inserting or updating XML documents into a table. As mentioned previously, XML schemas should be registered before using them for validation.

The XMLVALIDATE function validates documents either implicitly or explicitly.

  • Implicit validation: The XML document specifies the schema location and namespace using the xsi:schemaLocation attribute or specifies only schema location using xsi:noNamespaceSchemaLocation attribute. DB2 takes these input values and searches the catalog tables to identify XML schema needed to validate XML document. For example, you could insert the following XML documents into an existing table named EMPLOYEE having two columns, one of type integer and the second of type XML.


    Listing 3. Implicit validation
    				
    INSERT INTO employee VALUES (101, 
    '<employee xmlns:xsi="http://www/w3.org/2001/XMLSchema" 
    			xsi:noNamespaceSchemaLocation="employee.xsd" id = "101">
    	<name>Sonal</name>
    	<dept>Sales</dept>
    	<salary>10000</salary>
    	<designation>Sales Executive</designation>
    	<date-of-join>2007-10-10</date-of-join>
    	<address>
      		<street>M.G. Road</street>
      		<city>Bangalore</city>
      		<state>Karnataka</state>
      		<country>India</country>
      		<zipcode>560001</zipcode>
    	</address>
    </employee>')
    			

  • Explicit validation: The XML document explicitly specifies the SQL identifier or namespace and schema location information of the XML schema using the ACCODING TO XMLSCHEMA ID clause in the XMLVALIDATE function. The example below shows explicit validation.


    Listing 4. Explicit validation
    				
    INSERT INTO employee VALUES (101, XMLVALIDATE(XMLPARSE(document 
             '<employee id="101">
                <name>Sonal</name>
                <dept>Sales</dept>
                <salary>10000</salary>
                <designation>Sales Executive</designation>
                <date-of-join>2007-10-10</date-of-join>
                <address>
                    <street>M.G. Road</street> 
                    <city>Bangalore</city>
                    <state>Karnataka</state>
                    <country>India</country>
                    <zipcode>560001</zipcode>
                </address>
             </employee>') ACCORDING TO XMLSCHEMA ID employee))
    			

In DB2, XML document validation is done on a per document basis, not at column level. A table having an XML column need not have all the documents pertaining to one single XML schema. It can have XML documents that can pertain to different schemas. You can check whether documents are validated or not using the VALIDATED predicate.


Validation through check constraints on XML columns

Consider a scenario where you want to bind a set of XML schemas with an XML column. However the responsibility of validating the XML value lies with the application. In this case, the DBA can put a check constraint on the XML column and ensure that the application uses one of the pre-registered schemas for validation. This will also ensure that the XML document always gets validated.

Check constraints are constraints on table columns that are used to restrict the user to a particular check condition. While performing inserts or updates on a column, only when the check condition evaluates to true can you perform the operation on the table.

Creating a table with check constraint on an XML column

Check constraints on XML give you the ability to specify constraints on an XML column in terms of XML schema validation, to ensure that only validated XML documents with correct schemas are stored in the column. This ensures the consistency of the information stored in XML columns. It also gives the ability to define XML expressions on the XML values as they are validated according to a pre-defined set of schemas. The IS VALIDATED or IS NOT VALIDATED predicates with the ACCORDING TO XMLSCHEMA clause can be used to verify whether the document is validated or not validated against the specified schemas.

Check constraints can also list a set of schemas. This means the XML document can be validated against one of the XML schemas specified. By providing multiple schemas in the check constraint, you gain the flexibilty of using one of the schemas for validation to help in associating multiple schemas with a particular XML column. The example in Listing 5 shows how a single schema can be specified in a check constraint.


Listing 5. Create check constraint on an XML column
				
CREATE TABLE customer (cust_id INTEGER NOT NULL, 
      info XML CONSTRAINT val_check 
      CHECK(info IS VALIDATED ACCORDING TO XMLSCHEMA ID customer))
			

The above example creates a check constraint on the column INFO which is of type XML. The check constraint ensures that the document is validated according to customer schema (assuming that the customer schema is registered in DB2). Note that the check constraint evaluates whether the XML document is validated or not. It does not perform the actual validation.

The example in Listing 6 shows how multiple schemas can be listed while creating one particular check constraint.


Listing 6. Create check constraint on an XML column with multiple schemas
				
CREATE TABLE customer (cust_id INTEGER NOT NULL,
          cust_info XML CONSTRAINT val_check 
          CHECK (cust_info IS VALIDATED ACCORDING TO XMLSCHEMA IN 
          (ID registered_customers, ID online_customers, ID regular_customers)))
          

The insert statement in the following example fails because it validates the XML document according to a schema not specified in the check constraint clause.


Listing 7. check constraint violation
				
INSERT INTO customer 
VALUES (1001, XMLVALIDATE(XMLPARSE(cust_info) 
ACCORDING TO XMLSCHEMA ID cust))
			

Inserting XML values through views

XML documents can also be inserted through views into their respective base tables. When an insert is performed on an XML column through a view, the check constraint of the base table is applied on the XML document, and only when it satisfies the check constraint does the XML document gets inserted into the table. The example below creates a view on two tables having XML documents.


Listing 8. Insert XML documents through views
				
CREATE TABLE distributor (dist_id INTEGER, dist_info XML 
	CONSTRAINT dist_check 
	CHECK(dist_info IS VALIDATED ACCORDING TO XMLSCHEMA ID distributor))
		
CREATE VIEW cust_dist_view(id, details) AS 
(SELECT id, info FROM customer 
UNION ALL
SELECT dist_id, dist_info FROM distributor)

INSERT INTO cust_dist_view VALUES (id, XMLVALIDATE(XMLPARSE(details)
ACCORDING TO XMLSCHEMA ID distributor))

INSERT INTO cust_dist_view VALUES (id, XMLVALIDATE(XMLPARSE(details) 
ACCORDING TO XMLSCHEMA ID registered_customers))
			

The CREATE VIEW statement creates a view of the customer and distributor tables. The first insert statement inserts data into DISTRIBUTOR table if the XML data stored in DETAILS validates against the XML schema DISTRIBUTOR. This second insert statement inserts data into CUSTOMER table if the XML data stored in DETAILS validates against the XML schema REGISTERED_CUSTOMERS.

Check constraint and view dependency on XML schema

Check constraints and views created on top of a table with check constraints on an XML column are dependent on XML schemas. If the schema is dropped, the check constraint cannot check the validation against the dropped schema and hence the insert succeeds.


Listing 9. Check constraint and view dependency on XML schema
				
DROP XSROBJECT customer
DROP VIEW cust_dist_view
INSERT INTO customer VALUES(201, cust_info)
INSERT INTO cust_dist_view VALUES (202, details);

			

The insert operations on CUSTOMER table and CUST_DIST_VIEW will succeed without any validation.


Enforcing automatic validation through triggers

In a situation where an application is developed in such a way that it is difficult to change the insert statements to do an explicit validation, the DBA can write a trigger which will be activated on every insert or update operation to do the validation automatically.

DB2 9.5 introduced a concept of enforced automatic validation of incoming XML documents using BEFORE triggers. Whenever an insert or update operation is performed on a table, automatic validation ensures that XML documents get validated against the specified XML schema. This automatic validation is done in the action part of the trigger statement using the XMLVALIDATE function.

Creating a BEFORE trigger statement to enforce automatic validation

Whenever an XML document is inserted or updated in a table, the trigger gets executed. The action part of the trigger statement includes the XMLVALIDATE function to validate the XML documents against the XML schema specified.

For example, an employee table contains employee ID and employee information. Employee ID column EMP_ID is of type integer, and employee information column INFO is of type XML. The INFO column contains employee information such as name, designation, salary, date of joining, department and contact details represented in an XML document. Automatic validation on incoming XML documents can be enforced as follows.


Listing 10. Validate employee information using BEFORE trigger statement
				
CREATE TABLE employee (emp_id INTEGER, info XML);                                

CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT on employee
REFERENCING NEW AS n
FOR EACH ROW MODE DB2 SQL 
BEGIN ATOMIC
set (n.info) = XMLVALIDATE (n.info ACCORDING TO XMLSCHEMA ID employee);
END
			

The XMLVALIDATE function in the action part of trigger statement validates the employee information specified in the INFO column. The ACCORDING TO XMLSCHEMA clause is used to validate INFO against the employee XML schema. When an XML document is inserted into the EMPLOYEE table, a BEFORE INSERT trigger statement is executed and the XML document is validated against the employee schema. Only if validation succeeds will the record be inserted into the table.

Note: If the trigger fails to validate the XML document, the document does not get inserted. It does not return any error message.

Using IS {NOT} VALIDATED predicates in WHEN clause

There is always a possibility that the application validates the XML document before inserting it. However, there can also be a trigger defined to do the automatic validation. In such a situation, revalidating the XML document incurs extra overhead.

In this case, a trigger can be implemented to first verify that the validation has already been done. If so, it skips the validation process by using the WHEN clause in the trigger body. Only if the WHEN clause evaluates to true will the action part of trigger statement be executed. The following example demonstrates the use of the WHEN clause in the trigger.


Listing 11. Using a WHEN clause in a BEFORE UPDATE trigger statement
				
CREATE TRIGGER val_trig NO CASCADE BEFORE UPDATE ON employee
REFERENCING NEW AS n
FOR EACH ROW MODE DB2 SQL
WHEN (n.info IS NOT VALIDATED ACCORDING TO XMLSCHEMA ID employee)
BEGIN ATOMIC
set (n.info) = XMLVALIDATE(n.info ACCORDING TO XMLSCHEMA ID employee);
END
			

The IS VALIDATED predicate returns true if the document is already validated against the specified schema and returns false otherwise. IS NOT VALIDATED predicate returns true if the XML document is not validated against specified schema. If the XML document is already validated, it returns false. Hence, if the document is already validated, it does not validate again. If the document is not validated, then it executes the action part of trigger statement and validates the XML document according to the specified schema.

In code listing 11, the WHEN clause checks to determine whether or not the XML document INFO is not validated against the EMPLOYEE schema. If it is true, then the INFO document gets validated against the EMPLOYEE schema in the trigger statement. Otherwise the action part of the trigger statement does not get executed.

Validating XML document against a schema located at a particular URI

We can validate XML documents against XML schema located at a particular URI. The ACCORDING TO XMLSCHEMA clause is used to validate the XML document INFO against the URI specified as part of XMLVALIDATE function. Listing 12 shows an example.


Listing 12. Validating employee information against a specified URI
				
CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT ON employee 
REFERENCING NEW as n
FOR EACH ROW MODE DB2SQL 
BEGIN ATOMIC 
  set n.info = XMLVALIDATE(n.info ACCORDING TO XMLSCHEMA URI 
                     'http://posample1.org');  
END
			

The XMLVALIDATE function validates the employee information against the schema specified at URI http://posample1.org

Restriction of XML functions on transition variables

DB2 9.5 BEFORE triggers have a restriction against using functions on transition variables. If you use any other XML functions on transition variables other than XMLVALIDATE, the trigger creation will fail. For example:


Listing 13. Using XMLELEMENT function on a transition variable
				
CREATE TRIGGER val_trig NO CASCADE BEFORE INSERT ON employee
REFERENCING NEW AS n 
FOR EACH ROW MODE DB2SQL 
BEGIN ATOMIC
set (n.id) = 1000;
set (n.info) = XMLDOCUMENT(XMLELEMENT(name Russel, n.info));
END
			

This CREATE TRIGGER statement will fail because the action part of the trigger is using the XMLELEMENT function on transition variable n.info.


Using IS VALIDATED and IS NOT VALIDATED predicates in SELECT statements

Validation is optional in DB2. A table can store a combination of validated and not validated XML documents in a single XML column. The IS VALIDATED predicate returns true if the XML document is already validated against any specific schema or against any registered schemas, else it will return false. The IS NOT VALIDATED predicate returns true if the XML document is not validated against specified schema or any of the registered schemas, or else it will return false. These predicates can also be used in the WHERE clause of the SELECT statement.


Listing 14. Using IS VALIDATED predicate
				
SELECT cust_id, cust_info FROM customer 
WHERE cust_info IS VALIDATED
			

This statement will retrieve only those CUST_INFO documents from CUSTOMER table which are validated against any of the registered schemas. You can use the ACCORDING TO clause to retrieve only those XML documents which are validated against a specified schema as shown in Listing 15:


Listing 15. Using IS VALIDATED predicate with ACCORDING TO clause
				
SELECT cust_id, cust_info FROM customer
WHERE cust_info IS VALIDATED ACCORDING TO XMLSCHEMA ID registered_customers
			

This statement will retrieve only those CUST_INFO XML documents from CUSTOMER table which are validated against the schema REGISTERED_CUSTOMERS.


Listing 16. Using IS NOT VALIDATED predicate
				
SELECT cust_id, cust_info FROM customer
WHERE info IS NOT VALIDATED 
			

This statement will retrieve CUST_INFO documents from the CUSTOMER table which are not validated against a registered XML schema. We can use the ACCORDING TO clause with the IS NOT VALIDATED predicate as shown in Listing 17:


Listing 17. Using the IS NOT VALIDATED predicate with the ACCORDING TO clause
				
SELECT cust_id, cust_info FROM customer
WHERE info IS NOT VALIDATED ACCORDING TO XMLSCHEMA ID registered_customers
			

This statement retrieves all CUST_INFO XML documents from the CUSTOMER table which are not validated against the XML schema REGISTERED_CUSTOMERS.


Shredding XML values to relational columns

Constraints such as primary key - foreign key relationships and uniqueness of values are not possible on particular values of an XML document. However if the application has such a requirement for a particular element in an XML document, it is possible to shred that value of the element into a relational column, and then apply the constraints accordingly. The code in Listing 18 selects the values from the EMPLOYEE table and insert it into a temporary table by shredding the values from the INFO XML column of the table EMPLOYEE.


Listing 18. Shredding an XML element value to a relational column using SQL/XML
				
INSERT INTO TEMP (SELECT id,info FROM 
     employee, xmltable('$INFO/employee' COLUMNS id INT PATH '@id') AS t)
			

However, this kind of shredding creates duplicate data. Therefore you need to take care to make sure data is consistent across multiple locations. Here, the ID value in the shredded relational column should always match with the value in the XML column. This kind of consistency can be maintained by creating a stored procedure for INSERT and UPDATE operations on this column. Whenever a user inserts an XML value, shredding can be used to get the data from the XML value and store it in the corresponding relational columns. Similarly, an UPDATE operation can update the relational value too. The opposite operations can either be restricted or can be done using similar stored procedures (update in the relational column updates the XML document too). Listing 19 shows an example of a stored procedure that shreds the data and then inserts the correct values in the columns.


Listing 19. The INSERT stored procedure
				
    	CREATE PROCEDURE INSERTEMPLOYEE (XML INFO)
P1: BEGIN
	DECLARE P1 INTEGER;

SELECT ID INTO P1 FROM XMLTABLE('$INFO/employee' PASSING INFO AS "INFO" 
                COLUMNS ID INT PATH '@id') AS T;

INSERT INTO TEMP
  VALUES (ID, INFO);

END P1
    	


Summary

DB2 9 supports XML data in its native format and offers various options to query data. In this article, we looked at many new ways to make sure that XML data remains consistent when the data is queried, inserted, updated or deleted. The trigger support for XML allows the use of XMLVALIDATE function to implement automatic validation. You can use any of these options to make sure the data remains consistent and correct during its life cycle.


Acknowledgement

The authors would like to thank Susan Malaika and Iti Rawat for reviewing this article and giving us their valuable suggestions.


Resources

Learn

Get products and technologies

Discuss

About the authors

Manoj Sardana

Manoj Sardana is a staff software engineer working with IBM India software labs. At present he works for the pureQuery development team. In the past, he developed sample applications for new features of DB2. He also worked for functional verification testing project for DB2. He is an IBM certified application developer and advanced database administrator for DB2 9. He is also IBM certified solution developer for XML and related technology.

Madhavi Kaza photo

Madhavi Kaza works for the DB2 ID Samples and Technology demonstration team in the India Software Labs. She works on samples development for XML features. She has developed various XML samples for DB2 9 and 9.5 XML features and is now developing sample applications for DB2 9.7.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, XML
ArticleID=422495
ArticleTitle=New options for XML data integrity and validation in DB2 V9.5
publish-date=08272009
author1-email=msardana@in.ibm.com
author1-email-cc=
author2-email=madhkaza@in.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers