Compiler rewrite example: Operation movement - Predicate pushdown for combined SQL/XQuery statements

One fundamental technique for the optimization of relational SQL queries is to move predicates in the WHERE clause of an enclosing query block into an enclosed lower query block (for example, a view), thereby enabling early data filtering and potentially better index usage.

This is even more important in partitioned database environments, because early filtering potentially reduces the amount of data that must be shipped between database partitions.

Similar techniques can be used to move predicates or XPath filters inside of an XQuery. The basic strategy is always to move filtering expressions as close to the data source as possible. This optimization technique is called predicate pushdown in SQL and extraction pushdown (for filters and XPath extractions) in XQuery.

Because the data models employed by SQL and XQuery are different, you must move predicates, filters, or extractions across the boundary between the two languages. Data mapping and casting rules have to be considered when transforming an SQL predicate into a semantically equivalent filter and pushing it down into the XPath extraction. The following examples address the pushdown of relation predicates into XQuery query blocks.

Consider the following two XML documents containing customer information:
Document 1                   Document 2

<customer>                   <customer>
  <name>John</name>            <name>Michael</name>
  <lastname>Doe</lastname>     <lastname>Miller </lastname>
  <date_of_birth>              <date_of_birth>
	  1976-10-10               	  1975-01-01
  </date_of_birth>             </date_of_birth>
  <address>                    <address>
    <zip>95141.0</zip>           <zip>95142.0</zip>
  </address>                   </address>
<volume>80000.0</volume>     <volume>100000.00</volume>
</customer>                  </customer>
<customer>                   <customer>
  <name>Jane</name>            <name>Michaela</name>
  <lastname>Doe</lastname>     <lastname>Miller</lastname>
  <date_of_birth>              <date_of_birth>
    1975-01-01               	  1980-12-23
  </date_of_birth>             </date_of_birth>
  <address>                    <address>
    <zip>95141.4</zip>           <zip>95140.5</zip>
  </address>                   </address>
<volume>50000.00</volume>    <volume>100000</volume>
</customer>                  </customer>

Example - Pushing INTEGER predicates

Consider the following query:
   select temp.name, temp.zip
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       zip integer path 'customer/zip'
       ) as temp
     where zip = 95141
To use possible indexes on T.XMLDOC and to filter out records that are not needed early on, the zip = 95141 predicate will be internally converted into the following equivalent XPATH filtering expression:
T.XMLCOL/customer/zip[. >= 95141.0 and . < 95142.0] 

Because schema information for XML fragments is not used by the compiler, it cannot be assumed that ZIP contains integers only. It is possible that there are other numeric values with a fractional part and a corresponding double XML index on this specific XPath extraction. The XML2SQL cast would handle this transformation by truncating the fractional part before casting the value to INTEGER. This behavior must be reflected in the pushdown procedure, and the predicate must be changed to remain semantically correct.

Example - Pushing DECIMAL(x,y) predicates

Consider the following query:
   select temp.name, temp.volume
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       volume decimal(10,2) path 'customer/volume'
       ) as temp
     where volume = 100000.00
To use possible DECIMAL or DOUBLE indexes on T.XMLDOC and to filter out records that are not needed early on, similar to the handling for the INTEGER type, the volume = 100000.00 predicate will be internally converted into the following XPATH range filtering expression:
T.XMLCOL/customer/volume[.>=100000.00 and .<100000.01]

Example - Pushing VARCHAR(n) predicates

Consider the following query:
   select temp.name, temp.lastname
     from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
       columns name varchar(20) path 'customer/name',
       lastname varchar(20) path 'customer/lastname'
       ) as temp
     where lastname = 'Miller'
To use possible indexes on T.XMLDOC and to filter out records that are not needed early on, the lastname = 'Miller' predicate will be internally converted into an equivalent XPATH filtering expression. A high-level representation of this expression is: :
T.XMLCOL/customer/lastname[. >= rtrim("Miller") and . < 
   RangeUpperBound("Miller", 20)]

Trailing blanks are treated differently in SQL than in XPath or XQuery. The original SQL predicate will not distinguish between the two customers whose last name is Miller, even if one of them (Michael) has a trailing blank. Consequently, both customers are returned, which would not be the case if an unchanged predicate were pushed down.

The solution is to transform the predicate into a range filter.
  • The first boundary is created by truncating all trailing blanks from the comparison value, using the RTRIM() function.
  • The second boundary is created by looking up all possible strings that are greater than or equal to Miller, so that all strings that begin with Miller are located. Therefore, the original string is replaced with an upperbound string that represents this second boundary.