Automatic query rewrite—complex examples
These examples can help you understand how Db2 applies automatic query rewrite to avoid costly aggregations and joins against large fact tables.
The data warehouse records transactions that are made with credit cards. Each transaction consists of a set of items that are purchased together. At the center of the data warehouse are two large fact tables. TRANS records the set of credit card purchase transactions. TRANSITEM records the information about the items that are purchased. Together, these two fact tables are the hub of the star schema. The star schema is a multi-fact star schema because it contains these two fact tables. The fact tables are continuously updated for each new credit card transaction.
In addition to the two fact tables, the schema contains four dimensions that describe transactions: product, location, account, and time.
- The product dimension consists of two normalized tables, PGROUP and PLINE, that represent the product group and product line.
- The location dimension consists of a single, denormalized table, LOC, that contains city, state, and country.
- The account dimension consists of two normalized tables, ACCT and CUST, that represent the account and the customer.
- The time dimension consists of the TRANS table that contains day, month, and year.
Analysts of such a credit card application are often interested
in the aggregation of the sales data. Their queries typically perform
joins of one or more dimension tables with fact tables. The fact tables
contain significantly more rows than the dimension tables, and complicated
queries that involve large fact tables can be very costly. In many
cases, you can use materialized query table to summarize and store
information from the fact tables. Using materialized query tables
can help you avoid costly aggregations and joins against large fact
tables.
Example 1
An analyst submits the following
query to count the number of transactions that are made in the United
States for each credit card. The analyst requests the results grouped
by credit card account, state, and year:
UserQ1
------
SELECT T.ACCTID, L.STATE, T.YEAR, COUNT(*) AS CNT
FROM TRANS T, LOC L
WHERE T.LOCID = L.ID AND
L.COUNTRY = 'USA'
GROUP BY T.ACCTID, L.STATE, T.YEAR;Assume that the following CREATE TABLE statement created a materialized query table named TRANSCNT:
CREATE TABLE TRANSCNT AS
(SELECT ACCTID, LOCID, YEAR, COUNT(*) AS CNT
FROM TRANS
GROUP BY ACCTID, LOCID, YEAR )
DATA INITIALLY DEFERRED
REFRESH DEFERRED;If you enable automatic query rewrite, Db2 can rewrite UserQ1 as NewQ1. NewQ1 accesses the TRANSCNT materialized query table instead of the TRANS fact table.
NewQ1
-----
SELECT A.ACCTID, L.STATE, A.YEAR, SUM(A.CNT) AS CNT
FROM TRANSCNT A, LOC L
WHERE A.LOCID = L.ID AND
L.COUNTRY = 'USA'
GROUP BY A.ACCTID, L.STATE, A.YEAR;- The TRANS table is common to both UserQ1 and TRANSCNT.
- Db2 can derive the columns of the query result from TRANSCNT.
- The GROUP BY in the query requests data that are grouped at a higher level than the level in the definition of TRANSCNT.
Example 2
UserQ2
------
SELECT T.ID, TI.QUANTITY * TI.PRICE * (1 - TI.DISCOUNT) AS AMT
FROM TRANSITEM TI, TRANS T, PGROUP PG
WHERE TI.TRANSID = T.ID AND
TI.PGID = PG.ID AND
TI.PRICE > 100 AND
TI.DISCOUNT > 0.1 AND
PG.NAME = 'TV';If you define the following materialized query table TRANSIAB, Db2 can rewrite UserQ2 as NewQ2:
TRANSIAB
--------
CREATE TABLE TRANSIAB AS
(SELECT TI.TRANSID, TI.PRICE, TI.DISCOUNT, TI.PGID,
L.COUNTRY, TI.PRICE * TI.QUANTITY as VALUE
FROM TRANSITEM TI, TRANS T, LOC L
WHERE TI.TRANSID = T.ID AND
T.LOCID = L.ID AND
TI.PRICE > 1 AND
TI.DISCOUNT > 0.1)
DATA INITIALLY DEFERRED
REFRESH DEFERRED;
NewQ2
-----
SELECT A.TRANSID, A.VALUE * (1 - A.DISCOUNT) as AM
FROM TRANSIAB A, PGROUP PG
WHERE A.PGID = PG.ID AND
A.PRICE > 100 AND
PG.NAME = 'TV';Db2 can rewrite UserQ2 as a new query that uses materialized query table TRANSIAB because of the following reasons:
- Although the predicate
T.LOCID = L.IDappears only in the materialized query table, it does not result in rows that Db2 might discard. The referential constraint between the TRANS.LOCID and LOC.ID columns makes the join between TRANS and LOC in the materialized query table definition lossless. The join is lossless only if the foreign key in the constraint is NOT NULL. - The predicates
TI.TRANSID = T.IDandTI.DISCOUNT > 0.1appear in both the user query and the TRANSIAB fullselect. - The fullselect predicate
TI.PRICE >1in TRANSIAB subsumes the user query predicateTI.PRICE > 100in UserQ2. Because the fullselect predicate is more inclusive than the user query predicate, Db2 can compute the user query predicate from TRANSIAB. - The user query predicate
PG.NAME = 'TV'refers to a table that is not in the TRANSIAB fullselect. However, Db2 can compute the predicate from the PGROUP table. A predicate likePG.NAME = 'TV'does not disqualify other predicates in a query from qualifying for automatic query rewrite. In this case PGROUP is a relatively small dimension table, so a predicate that refers to the table is not overly costly. - Db2 can
derive the query result from the materialized query table definition,
even when the derivation is not readily apparent:
- Db2 derives
T.ID in the query from T.TRANSID in the TRANSIAB fullselect. Although
these two columns originate from different tables, they are equivalent
because of the predicate
T.TRANSID = T.ID. Db2 recognizes such column equivalency through join predicates. Thus, Db2 derives T.ID from T.TRANSID, and the query qualifies for automatic query rewrite. - Db2 derives AMT in the query UserQ2 from DISCOUNT and VALUE in the TRANSIAB fullselect.
- Db2 derives
T.ID in the query from T.TRANSID in the TRANSIAB fullselect. Although
these two columns originate from different tables, they are equivalent
because of the predicate
Example 3
This example shows how Db2 matches
GROUP BY items and aggregate functions between the user query and
the materialized query table fullselect. Assume that an analyst submits
the following query to find the average value of the transaction items
for each year:
UserQ3
------
SELECT YEAR, AVG(QUANTITY * PRICE) AS AVGVAL
FROM TRANSITEM TI, TRANS T
WHERE TI.TRANSID = T.ID
GROUP BY YEAR;If you define the following materialized query table TRANSAVG, Db2 can rewrite UserQ3 as NewQ3:
TRANSAVG
--------
CREATE TABLE TRANSAVG AS
(SELECT T.YEAR, T.MONTH, SUM(QUANTITY * PRICE) AS TOTVAL, COUNT(*) AS CNT
FROM TRANSITEM TI, TRANS T
WHERE TI.TRANSID = T.ID
GROUP BY T.YEAR, T.MONTH )
DATA INITIALLY DEFERRED
REFRESH DEFERRED;
NewQ3
-----
SELECT YEAR, CASE WHEN SUM(CNT) = 0 THEN NULL
ELSE SUM(TOTVAL)/SUM(CNT)
END AS AVGVAL
FROM TRANSAVG
GROUP BY YEAR;- Db2 considers YEAR in the user query and YEAR in the materialized query table fullselect to match exactly.
- Db2 can derive the AVG function in the user query from the SUM function and the COUNT function in the materialized query table fullselect.
- The GROUP BY clause in the query NewQ3 requests data at a higher level than the level in the definition of TRANSAVG.
- Db2 can compute the yearly average in the user query by using the monthly sums and counts of transaction items in TRANSAVG. Db2 derives the yearly averages from the CNT and TOTVAL columns of the materialized query table by using a case expression.