XML schemas and index key generation

You should examine your XML schemas so that you can create indexes on XML columns with data types that match your XML schemas data type specifications. The queries that you want to run should also be taken into account when deciding which XML patterns to choose for your indexes.

If an XML schema is used, then the structure of XML documents to be stored in an XML column will be validated so that the data types of the elements and attributes in the XML documents are constrained against the XML schema. If a document does not match the schema's specifications, then the document is rejected by the parser. For example, if the schema specifies that an attribute is constrained to the DOUBLE data type and the value of the document's attribute is ABC, then the document is rejected. If an XML schema is not used, then document data is not validated by the parser and is considered to be untyped data.

For example, suppose the following XML schema is used:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 

<xsd:element name="product" type="ProdType"/>
           
<xsd:simpleType name="ColorType">
  <xsd:restriction base="xsd:string">
    <xsd:maxLength value='20'/>
  </xsd:restriction>
</xsd:simpleType>

<xsd:complexType name="ProdType">
  <xsd:sequence>
    <xsd:element name="name"    type="xsd:string" />
    <xsd:element name="SKU"     type="xsd:string" />
    <xsd:element name="price"   type="xsd:integer" />
    <xsd:element name="comment" type="xsd:string" />
  </xsd:sequence>
  <xsd:attribute name="color"   type="ColorType" />
  <xsd:attribute name="weight"  type="xsd:integer" />
</xsd:complexType>
</xsd:schema>

After looking at the queries you need to issue, you may decide they need indexes on price and color. Analyzing the queries will help you decide what XML pattern expression to include in your CREATE INDEX statement. The XML schema provides guidance on what data type to pick for the index: you can tell that the price element is a decimal number, so the numeric data type of DECIMAL can be chosen for the index priceindex and the color attribute is a string so the data type of VARCHAR can be chosen for the index colorindex.

XQUERY for $i in db2-fn:xmlcolumn('COMPANY.PRODUCTDOCS')/product[price > 5.00] 
   return $i/name
XQUERY for $i in db2-fn:xmlcolumn('COMPANY.PRODUCTDOCS')/product[@color = 'pink'] 
   return $i/name
CREATE INDEX priceindex on company(productdocs)
      GENERATE KEY USING XMLPATTERN '/product/price' AS DECIMAL(7,2)
CREATE INDEX colorindex on company(productdocs)
      GENERATE KEY USING XMLPATTERN '//@color' AS SQL VARCHAR(80) 

The schema may also specify other constraints for the string data type, such as maxLength which is shown in the example under ColorType where the string is restricted to 20 unicode characters. Since the CREATE INDEX statement specifies the VARCHAR length in bytes and not in characters, the schema length may be multiplied by a factor of 4 to calculate the maximum number of bytes that will be required to store the longest string allowed by the schema in the index. In this case, 4*20 = 80 so VARCHAR(80) is chosen for colorindex.

If the schema does not specify a length restriction for a string data type and the maximum string lengths for the values in the documents are not known, then you can use the maximum length allowed by the page size used by the index. An index stores strings of varying lengths, but since only the actual number of bytes required for each string is stored there is no storage penalty for specifying a longer maximum length than is needed. However, larger key buffers in memory do need to be allocated to handle the maximum key size during index scans. See the CREATE INDEX statement for the list of maximum allowed lengths for an index on an XML column specifying the VARCHAR data type.

If the maximum length for the VARCHAR data type is not sufficiently long to index the document values, then the VARCHAR HASHED data type may be used, which does not have a length limit. However, indexes using VARCHAR HASHED can be used only for equality look-ups and not for range scans. Note that documents which contain strings longer than the length specified for VARCHAR(integer) will be rejected.

The XML schema may also specify default attribute and element values. If there are no corresponding values specified in the XML document and the document is validated, then the default values from the schema are used when the document is stored. These default values will be indexed, along with the other values that were in the original input document. If the document is not validated, then the default values are not added to document and they are not indexed.