One of the key differences of XML from the Large Object (LOB) types in DB2 is that XML data can be indexed. The XML indexes supported in DB2 9 are also called XML value indexes, and queries on XML can use these indexes for performance.
An XML index example
An XML index is used to provide a mapping from a node value to its location. An XPath expression is required to specify the nodes within a document to be indexed. Two data types are supported for XML indexes in DB2 9 for z/OS: DECFLOAT for numeric values and VARCHAR(n) for string values. For example, the following CREATE INDEX DDL creates an XML index on TransRefGUID of the REQUESTXML column of ACORD data, as VARCHAR(24):
CREATE INDEX ACORD.ACORDINDEX1 ON ACORD.REQUEST(REQUESTXML)GENERATE KEYS USING XMLPATTERN'declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest/TransRefGUID' as SQL VARCHAR(24)
DB2 will take the value of /TXLife/TXLifeRequest/TransRefGUID as the key, and map to its logical location (DOCID, NODEID), and physical location (RID), using the existing B+-tree index infrastructure.This index can be used for queries that search on TransRefGUID. For example, the following XMLEXISTS predicate can potentially use this index:
XMLExists('declare default element namespace "http://ACORD.org/Standards/Life/2"; /TXLife/TXLifeRequest[TransRefGUID="2004-1217-141016-000012"]'PASSING REQUEST.REQUESTXML)
From queries, it is easy to figure out what kind of XML indexes can speed up the queries - by concatenating path steps from within a predicate to the steps outside the predicate, and using data types consistent between the queries and indexes.
Before explaining the index access procedure, I'd like to review the basic XML storage scheme so you can understand it better.
Basic XML storage scheme in DB2 9 for z/OS
The following picture depicts the high-level storage scheme for XML data.
At high-level, the XML data is stored in a separate table space, just like LOB data. The real XML data is stored in the XMLDATA column of the internal XML table. It contains the hierarchical data in records that can fit in 16KB pages. In order to support free movement of data records, logical links using NODEIDs are used. That's why we need a NODEID index to link records for a document. Similarly, since utilities, such as REORG, can be applied to the base table space and the XML table space independently, XML indexes do not contain base table RIDs, but XML table RIDs. In order to get to base table rows from XML indexes, we need the DOCID index on the base table (see below). That's why DB2 always creates a DOCID index on the base table and a NODEID index on the XML table as part of the storage scheme, although they are for totally different purposes. By XML indexes, we refer to XML value indexes created by users.
Basic XML index access plans
If you use EXPLAIN for a query and select some key columns from the PLAN_TABLE, you will see some new access type in the ACCESSTYPE column for SQL/XML queries. They are the following:
- DX: DOCID list access. DB2 searches an XML index and retrieves all the qualified DOCIDs. Then the DOCIDs are sorted and duplicates are removed. The DOCID list is further converted to a RID list of the base table through the DOCID index. Then the base table rows are fetched.
- DI: DOCID ANDing. DB2 intersects two DOCID lists from two XML indexes for conjunction of two XPath predicates.
- DU: DOCID ORing. DB2 unions two DOCID lists from two XML indexes for disjunction of two XPath predicates.
- M: Multi-index access.
If you see "R" (R-Scan) in the ACCESSTYPE for a table with an XMLEXISTS predicate, then DOCSCAN is applied for the XML column. No new type was introduced for the scan.
Here is an example for an index ANDing plan:
+---------------------------------------------------------------------+ | PLANNO | ACCESSTYPE | MATCHCOLS | ACCESSNAME | MIXOPSEQ | +---------------------------------------------------------------------+1_| 1 | M | 0 | | 0 |2_| 1 | DX | 1 | ACORDINDEX2 | 1 |3_| 1 | DX | 1 | ACORDINDEX1 | 2 |4_| 1 | DI | 0 | | 3 | +---------------------------------------------------------------------+
The following diagram illustrate the process of using the index ANDing plan. Step 4 is where a DOCID index is always used.
These are the basic XML index access plans. For example, NODEID and RID from XML indexes are not used for queries today. We are enhancing plans for better query performance, so expect more methods in the future.
Somethings specific to XML indexes
The same principles for relational indexes apply to XML indexes, such as create indexes only needed by queries and use REBUILD INDEX. The following are some unique features of XML indexes:
- Depending on the XPath in the XMLPATTERN of the CREATE INDEX statement and XML documents, DB2 may generate zero, one or more key entries for a document in an XML index.
- For a numeric index (DECFLOAT type), if a node identified by the XMLPATTERN contains a value that cannot be cast to a number, it is ignored. But insert will be successful. For example, no index entry is generated for
XLif the index is DECFLOAT type.
- However, for a string index (VARCHAR(n) type), if a node identified by the XMLPATTERN contains a value that is longer than n, then the document will be rejected, or CREATE INDEX will fail. The maximum n is 1000.
- You can create as many indexes as you like on a single XML column. You could even create multiple indexes with different types on the same XMLPATTERN. This provides much flexibility for XML.
- Use fully specified XPath whenever possible to reduce the cost of maintaining the indexes.
- Index leaf nodes or nodes close to leaves. For non-leaf nodes, the key values are concatenated values from descendants, and their usage is a bit weird. For example, index on name for <name><firstname>John<firstname><lastname>Doe<lastname><name> will result in "JohnDoe" as the key, and you have to use [name="JohnDoe"] in queries to qualify for the index-based search. In addition, DB2 9 does not support a node across record boundaries.
To produce consistent query results with or without XML indexes, DB2 tries to tolerate cast errors during XMLEXISTS predicate numeric comparison. For example, if a node size contains "XL", comparison [size > 10] will tolerate the "XL" value, which is equivalent to evaluate to false.
DECFLOAT is used instead of DOUBLE for the numeric index type due to its precision. Date and time are not yet supported, but you can use string indexes if you use ISO format (which is required in XML) without timezone or always use the same timezone for the data, and use string comparison in the queries to search the documents.
If you never search inside XML documents, but get XML data in and out as a whole, you probably don't need to use the XML type, since VARCHAR or VARBINARY or LOB types can serve the purpose.
To summarize, XML indexes use XPath to identify nodes to be indexed, and can be used for queries with XMLEXISTS and XMLTABLE predicates.
-Guogen (Gene) Zhang (GGZ)[Read More]