Join predicate conversion
Join predicates should be converted to the appropriate data type on both sides.
- What join predicates can preclude index usage?
Consider two tables with XML columns for customer information and purchase orders, respectively:
CREATE TABLE customer(info XML); CREATE TABLE PurchaseOrder(POrder XML);The XML documents that contain customer information include an attribute @cid, the numeric customer ID (cid). The XML documents that contain purchase order information also include @cid attributes, so that each order is uniquely associated with a particular customer. Because customers and orders are expected to be frequently searched by cid, it makes sense to define indexes:
CREATE UNIQUE INDEX idx1 ON customer(info) GENERATE KEY USING XMLPATTERN '/customerinfo/@cid' AS SQL INTEGER; CREATE INDEX idx2 ON PurchaseOrder(POrder) GENERATE KEY USING XMLPATTERN '/porder/@cid' AS SQL INTEGER;We want to find the purchase orders for all customers in a specific ZIP code. Intuitively, we can write the query like this:
XQUERY for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder[@cid = $i/@cid] where $i/zipcode = "95141" return $j;Note that the join predicate @cid = $i/@cid requires the purchase order's cid to be equal to the customer cid.
This query returns the correct result, but neither of the two indexes can be used. The query is executed as a nested loop join with table scans for both tables. To avoid repeated table scans, a single table scan on customer to find all customers within ZIP code 95141 is preferable, followed by index lookups into the purchase order table using @cid. Note that it is necessary to scan the customer table since we have no index on zipcode.
The index is not used, because it would be incorrect to do so. If the index were used, Db2® may miss some matching purchase orders and return an incomplete result. This is because some values in the @cid attribute could potentially be non numeric. For example, @cid could equal YPS and thus not be included in the numeric index which is defined AS SQL INTEGER.
Note: If a value of an indexed node cannot be converted to the specified index data type, the index entry for that value is not inserted and no error or warning is raised.
- Enabling index usage with join predicates
- It is possible to enable the index, which is desirable if we are certain that all our @cid
values are numeric. The index will be used if we convert the join predicate explicitly to match the
type of the
index:
XQUERY for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder where $i/@cid/xs:int(.) = $j/@cid/xs:int(.) and $i/zipcode = "95141" return $j;Intuitively, the conversion advises Db2 that it should consider matching only @cid attributes which are convertible to INTEGER. With this directive, we can be sure that all required matches are represented in the index defined AS SQL INTEGER, and thus that it is safe to use that index. If there does exist a non numeric @cid value in one of the documents, then the conversion will fail with a run-time error.
Note that within XQuery, casting works only for singletons. Especially for elements (a,b, and c in the following example), it is recommended that you convert them like this:
/a/b/c/xs:double(.)If you were to convert the elements as follows, a run-time error would result if multiple elements c exist under any given element b:
/a/b/xs:double(c)For indexes that are defined AS SQL VARCHAR, the corresponding join predicates need to convert the compared values to the xs:string data type using the fn:string() function. The same applies to DATE and TIMESTAMP indexes. The following example shows how the fn:string() function is used in a string join:
XQUERY for $i in db2-fn:xmlcolumn("CUSTOMER.INFO")/customerinfo for $j in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/porder where $i/zipcode/fn:string(.) = $j/supplier/zip/fn:string(.) return <pair>{$i}{$j}</pair>
- Summary of conversion rules for join predicates
- The following table shows a summary of how join predicates should
be converted to the appropriate data type on both sides in order to
enable index usage.
Table 1. Conversion rules for join predicates Index SQL type Convert join predicate to XML type DOUBLE xs:double DECIMAL xs:decimal INTEGER xs:int VARCHAR integer, VARCHAR HASHED xs:string DATE xs:date TIMESTAMP xs:dateTime