XML validation

XML validation is the process of determining whether the structure, content, and data types of an XML document are valid. XML validation also strips off ignorable whitespace in the XML document.

Validation is optional but highly recommended when data integrity is in question, since it ensures that XML documents abide by the rules provided by their XML schemas on top of being well-formed.

Note that you can validate your XML documents against XML schemas only. You cannot validate an XML document against a DTD.

To validate an XML document, use the XMLVALIDATE function. You can specify XMLVALIDATE with an SQL statement that inserts or updates XML documents in a Db2® database. For automatic validation of XML documents, a BEFORE trigger on an XML column can also invoke the XMLVALIDATE function. To enforce validation of XML documents, you create a check constraint.

Before you can invoke the XMLVALIDATE function, all schema documents that make up an XML schema must be registered in the built-in XML schema repository. An XML document itself does not need to be in a database in order for you to be able to validate it with XMLVALIDATE.

XML validation and ignorable whitespace

According to the XML standard, whitespace is space characters (U+0020), carriage returns (U+000D), line feeds (U+000A), or tabs (U+0009) that are in the document to improve readability. When any of these characters appear as part of a text string, they are not considered to be whitespace.

Ignorable whitespace is whitespace that can be eliminated from the XML document. The XML schema document determines which whitespace is ignorable whitespace. If an XML document defines an element-only complex type (an element that contains only other elements), the whitespace between the elements is ignorable. If the XML schema defines a simple element that contains a non-string type, the whitespace within that element is ignorable.

Example: The description element in the sample product.xsd XML schema document is defined like this:
<xs:element name="description" minOccurs="0" maxOccurs="unbounded">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="name" type="xs:string" minOccurs="0" />
      <xs:element name="details" type="xs:string" minOccurs="0" />
      <xs:element name="price" type="xs:decimal" minOccurs="0" />
      <xs:element name="weight" type="xs:string" minOccurs="0" />
 ...
 </xs:complexType>
</xs:element>
The description element has an element-only complex type because it contains only other elements. Therefore, whitespace between elements in a description element is ignorable whitespace. The price element can also contain ignorable whitespace because it is a simple element that contains a non-string type.

In the XMLVALIDATE function, you can explicitly specify the XML schema document to use for validation. If you do not specify an XML schema document, the Db2 database server looks in the input document for an xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute that identifies the XML schema document. xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes are defined by the XML Schema specification, and are called XML schema hints. An xsi:schemaLocation attribute contains one or more pairs of values that help to locate the XML schema document. The first value in each pair is a namespace, and the second value is a hint that indicates where to find the XML schema for the namespace. An xsi:noNamespaceSchemaLocation value contains only a hint. If an XML schema document is specified in the XMLVALIDATE function, it overrides the xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute.

The following examples assume that schema product is registered in the XML schema repository (XSR). You might use CLP statements like this to complete the registration:
REGISTER XMLSCHEMA http://posample.org/product.xsd FROM product.xsd \
AS myschema.product 
COMPLETE XMLSCHEMA myschema.product
Alternatively, because the XML schema consists of a single schema document, you can use a single statement to register the XML schema and complete registration:
REGISTER XMLSCHEMA http://posample.org/product.xsd FROM product.xsd \
AS myschema.product COMPLETE
Example: Suppose that you create table MyProduct like this:
CREATE TABLE MyProduct LIKE Product
You want to insert the following document into XML column Info in the MyProduct table using a dynamic SQL application, and you want to validate the XML data against the XML schema document product.xsd, which is located in the XML schema repository on the same database server as the MyProduct table.
<product xmlns="http://posample.org" pid=''110-100-01'' >
 <description>
 <name>Anvil</name>
 <details>Very heavy</details>
 <price>         9.99            </price>
 <weight>1 kg</weight>
 </description>
</product>'
In your INSERT statement, the XMLVALIDATE function specifies the XML schema to use for validation:

Insert into MyProduct 
 (pid, name, Price, PromoPrice, PromoStart, PromoEnd, description) 
 values ( '110-100-01','Anvil', 9.99, 7.99, '11-02-2004','12-02-2004', 
 XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID myschema.product))
When you retrieve the stored data, you can see where XMLVALIDATE removes ignorable whitespace. The retrieved data is a single line with the following content:
<product xmlns="http://posample.org" pid="110-100-01"><description><name>Anvil
</name><details>Very heavy</details><price>9.99</price><weight>1 kg</weight>
</description></product>
The product schema defines the whitespace around the name, details, price, and weight elements, and the whitespace within the price element as ignorable whitespace, so XMLVALIDATE removes it.

If you need to ensure that you insert only documents that are validated into an XML column or retrieve only documents that are validated from an XML column, use the VALIDATED predicate.

To test whether or not an XML document has been validated before inserting or updating the document, create a check constraint that contains the VALIDATED predicate on the XML column. To retrieve only validated documents from an XML column, or to retrieve only those documents that have been inserted without validation, you use the VALIDATED predicate in a WHERE clause. If you need to check whether or not an XML document has been validated according to certain XML schemas, include the XML schemas with the VALIDATED predicate in the ACCORDING TO XMLSCHEMA clause.

The VALIDATED predicate can also be used as part of a trigger. To trigger validation of XML documents that have not yet been validated before you insert or update them in an XML column, create a BEFORE trigger that contains the VALIDATED predicate on the XML column in the WHEN clause to invoke the XMLVALIDATE function.

Example: Suppose that you want to retrieve only validated XML documents from the Info column of the MyCustomer table. Execute SELECT statements like this one:
SELECT Info FROM MyCustomer WHERE Info IS VALIDATED
Example: Suppose that you want to insert only validated XML documents into the Info column of the MyCustomer table. You can define a check constraint to enforce this condition. Alter the MyCustomer table in the following way:
ALTER TABLE MyCustomer ADD CONSTRAINT CK_VALIDATED CHECK (Info IS VALIDATED)
Issuing this statement, however, makes the use of the VALIDATED predicate in the previous example unnecessary, as only valid documents would be successfully inserted or updated in the table.
Example: Suppose that you want to validate the following document with the customer schema, but you do not want to store it in a database.
<customerinfo xml:space="default" 
  xmlns="http://posample.org" 
  Cid='1011'>
  <name>Kathy Smith</name>
  <addr country='Canada'>
  <street>25 Rosewood</street>
  <city>Toronto</city>
  <prov-state>Ontario</prov-state>
  <pcode-zip>M6W 1E6</pcode-zip>
  </addr>
  <phone type='work'>416-555-6676</phone>
</customerinfo>
Assume that you have assigned the document to an application variable. You can use a VALUES statement like this to do the validation:
VALUES XMLVALIDATE(? according to xmlschema id myschema.customer)
This document is valid according to the XML schema, so the VALUES statement returns a result table that contains the document. If the document is not valid, VALUES returns an SQL error.