Compiler rewrite example: Operation movement - Predicate pushdown for combined SQL/XQuery statements
One fundamental technique for the optimization of relational SQL queries is to move predicates in the WHERE clause of an enclosing query block into an enclosed lower query block (for example, a view), thereby enabling early data filtering and potentially better index usage.
This is even more important in partitioned database environments, because early filtering potentially reduces the amount of data that must be shipped between database partitions.
Similar techniques can be used to move predicates or XPath filters inside of an XQuery. The basic strategy is always to move filtering expressions as close to the data source as possible. This optimization technique is called predicate pushdown in SQL and extraction pushdown (for filters and XPath extractions) in XQuery.
Because the data models employed by SQL and XQuery are different, you must move predicates, filters, or extractions across the boundary between the two languages. Data mapping and casting rules have to be considered when transforming an SQL predicate into a semantically equivalent filter and pushing it down into the XPath extraction. The following examples address the pushdown of relation predicates into XQuery query blocks.
Document 1 Document 2
<customer> <customer>
<name>John</name> <name>Michael</name>
<lastname>Doe</lastname> <lastname>Miller </lastname>
<date_of_birth> <date_of_birth>
1976-10-10 1975-01-01
</date_of_birth> </date_of_birth>
<address> <address>
<zip>95141.0</zip> <zip>95142.0</zip>
</address> </address>
<volume>80000.0</volume> <volume>100000.00</volume>
</customer> </customer>
<customer> <customer>
<name>Jane</name> <name>Michaela</name>
<lastname>Doe</lastname> <lastname>Miller</lastname>
<date_of_birth> <date_of_birth>
1975-01-01 1980-12-23
</date_of_birth> </date_of_birth>
<address> <address>
<zip>95141.4</zip> <zip>95140.5</zip>
</address> </address>
<volume>50000.00</volume> <volume>100000</volume>
</customer> </customer>
Example - Pushing INTEGER predicates
select temp.name, temp.zip
from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
columns name varchar(20) path 'customer/name',
zip integer path 'customer/zip'
) as temp
where zip = 95141
zip
= 95141
predicate will be internally converted into the following
equivalent XPATH filtering expression:T.XMLCOL/customer/zip[. >= 95141.0 and . < 95142.0]
Because schema information for XML fragments is not used by the compiler, it cannot be assumed that ZIP contains integers only. It is possible that there are other numeric values with a fractional part and a corresponding double XML index on this specific XPath extraction. The XML2SQL cast would handle this transformation by truncating the fractional part before casting the value to INTEGER. This behavior must be reflected in the pushdown procedure, and the predicate must be changed to remain semantically correct.
Example - Pushing DECIMAL(x,y) predicates
select temp.name, temp.volume
from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
columns name varchar(20) path 'customer/name',
volume decimal(10,2) path 'customer/volume'
) as temp
where volume = 100000.00
volume
= 100000.00
predicate will be internally converted into the
following XPATH range filtering expression:T.XMLCOL/customer/volume[.>=100000.00 and .<100000.01]
Example - Pushing VARCHAR(n) predicates
select temp.name, temp.lastname
from xmltable('db2-fn:xmlcolumn("T.XMLDOC")'
columns name varchar(20) path 'customer/name',
lastname varchar(20) path 'customer/lastname'
) as temp
where lastname = 'Miller'
lastname
= 'Miller'
predicate will be internally converted into an
equivalent XPATH filtering expression. A high-level representation
of this expression is: : T.XMLCOL/customer/lastname[. >= rtrim("Miller") and . <
RangeUpperBound("Miller", 20)]
Trailing blanks
are treated differently in SQL than in XPath or XQuery. The original
SQL predicate will not distinguish between the two customers whose
last name is Miller
, even if one of them (Michael) has a trailing
blank. Consequently, both customers are returned, which would not
be the case if an unchanged predicate were pushed down.
- The first boundary is created by truncating all trailing blanks from the comparison value, using the RTRIM() function.
- The second boundary is created by looking
up all possible strings that are greater than or equal to
Miller
, so that all strings that begin withMiller
are located. Therefore, the original string is replaced with an upperbound string that represents this second boundary.