Join predicate conversion

Join predicates should be converted to the appropriate data type on both sides.

What join predicates can preclude index usage?

Consider two tables with XML columns for customer information and purchase orders, respectively:

CREATE TABLE customer(info XML);

CREATE TABLE PurchaseOrder(POrder XML);

The XML documents that contain customer information include an attribute @cid, the numeric customer ID (cid). The XML documents that contain purchase order information also include @cid attributes, so that each order is uniquely associated with a particular customer. Because customers and orders are expected to be frequently searched by cid, it makes sense to define indexes:

CREATE UNIQUE INDEX idx1 ON customer(info) 
   GENERATE KEY USING XMLPATTERN '/customerinfo/@cid' AS SQL INTEGER;

CREATE INDEX idx2 ON PurchaseOrder(POrder) 
   GENERATE KEY USING XMLPATTERN '/porder/@cid' AS SQL INTEGER;

We want to find the purchase orders for all customers in a specific ZIP code. Intuitively, we can write the query like this:

XQUERY
   for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo
   for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder[@cid = $i/@cid]
   where $i/zipcode = "95141" 
   return $j;

Note that the join predicate @cid = $i/@cid requires the purchase order's cid to be equal to the customer cid.

This query returns the correct result, but neither of the two indexes can be used. The query is executed as a nested loop join with table scans for both tables. To avoid repeated table scans, a single table scan on customer to find all customers within ZIP code 95141 is preferable, followed by index lookups into the purchase order table using @cid. Note that it is necessary to scan the customer table since we have no index on zipcode.

The index is not used, because it would be incorrect to do so. If the index were used, Db2® may miss some matching purchase orders and return an incomplete result. This is because some values in the @cid attribute could potentially be non numeric. For example, @cid could equal YPS and thus not be included in the numeric index which is defined AS SQL INTEGER.

Note: If a value of an indexed node cannot be converted to the specified index data type, the index entry for that value is not inserted and no error or warning is raised.
Enabling index usage with join predicates
It is possible to enable the index, which is desirable if we are certain that all our @cid values are numeric. The index will be used if we convert the join predicate explicitly to match the type of the index:
XQUERY
   for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo
   for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder
   where $i/@cid/xs:int(.) = $j/@cid/xs:int(.)
   and $i/zipcode = "95141"
   return $j;

Intuitively, the conversion advises Db2 that it should consider matching only @cid attributes which are convertible to INTEGER. With this directive, we can be sure that all required matches are represented in the index defined AS SQL INTEGER, and thus that it is safe to use that index. If there does exist a non numeric @cid value in one of the documents, then the conversion will fail with a run-time error.

Note that within XQuery, casting works only for singletons. Especially for elements (a,b, and c in the following example), it is recommended that you convert them like this:

/a/b/c/xs:double(.)

If you were to convert the elements as follows, a run-time error would result if multiple elements c exist under any given element b:

/a/b/xs:double(c)

For indexes that are defined AS SQL VARCHAR, the corresponding join predicates need to convert the compared values to the xs:string data type using the fn:string() function. The same applies to DATE and TIMESTAMP indexes. The following example shows how the fn:string() function is used in a string join:

XQUERY
   for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo
   for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder
   where $i/zipcode/fn:string(.) = $j/supplier/zip/fn:string(.)
   return <pair>{$i}{$j}</pair>
Summary of conversion rules for join predicates
The following table shows a summary of how join predicates should be converted to the appropriate data type on both sides in order to enable index usage.
Table 1. Conversion rules for join predicates
Index SQL type Convert join predicate to XML type
DOUBLE xs:double
DECIMAL xs:decimal
INTEGER xs:int
VARCHAR integer, VARCHAR HASHED xs:string
DATE xs:date
TIMESTAMP xs:dateTime