Automatic query rewrite—complex examples

These examples can help you understand how Db2 applies automatic query rewrite to avoid costly aggregations and joins against large fact tables.

The following examples assume a scenario in which a data warehouse has a star schema. The star schema represents the data of a simplified credit card application, as shown in the following figure.
Figure 1. Multi-fact star schema. In this simplified credit card application, the fact tables TRANSITEM and TRANS form the hub of the star schema. The schema also contains four dimensions: product, location, account, and time.
Begin figure description. The tables of an example multi-fact star schema. Arrows represent key relationships. This concept is described in the following paragraph. End figure description.

The data warehouse records transactions that are made with credit cards. Each transaction consists of a set of items that are purchased together. At the center of the data warehouse are two large fact tables. TRANS records the set of credit card purchase transactions. TRANSITEM records the information about the items that are purchased. Together, these two fact tables are the hub of the star schema. The star schema is a multi-fact star schema because it contains these two fact tables. The fact tables are continuously updated for each new credit card transaction.

In addition to the two fact tables, the schema contains four dimensions that describe transactions: product, location, account, and time.

  • The product dimension consists of two normalized tables, PGROUP and PLINE, that represent the product group and product line.
  • The location dimension consists of a single, denormalized table, LOC, that contains city, state, and country.
  • The account dimension consists of two normalized tables, ACCT and CUST, that represent the account and the customer.
  • The time dimension consists of the TRANS table that contains day, month, and year.

Analysts of such a credit card application are often interested in the aggregation of the sales data. Their queries typically perform joins of one or more dimension tables with fact tables. The fact tables contain significantly more rows than the dimension tables, and complicated queries that involve large fact tables can be very costly. In many cases, you can use materialized query table to summarize and store information from the fact tables. Using materialized query tables can help you avoid costly aggregations and joins against large fact tables.End program-specific programming interface information.

Example 1

Begin general-use programming interface information. An analyst submits the following query to count the number of transactions that are made in the United States for each credit card. The analyst requests the results grouped by credit card account, state, and year:

UserQ1
------
SELECT T.ACCTID, L.STATE, T.YEAR, COUNT(*) AS CNT
  FROM TRANS T, LOC L
  WHERE T.LOCID = L.ID AND
        L.COUNTRY = 'USA'
  GROUP BY T.ACCTID, L.STATE, T.YEAR;

Assume that the following CREATE TABLE statement created a materialized query table named TRANSCNT:

CREATE TABLE TRANSCNT AS
  (SELECT ACCTID, LOCID, YEAR, COUNT(*) AS CNT
     FROM TRANS
     GROUP BY ACCTID, LOCID, YEAR )
     DATA INITIALLY DEFERRED
     REFRESH DEFERRED;

If you enable automatic query rewrite, Db2 can rewrite UserQ1 as NewQ1. NewQ1 accesses the TRANSCNT materialized query table instead of the TRANS fact table.

NewQ1
-----
SELECT A.ACCTID, L.STATE, A.YEAR, SUM(A.CNT) AS CNT
  FROM TRANSCNT A, LOC L
  WHERE A.LOCID = L.ID     AND
        L.COUNTRY = 'USA'
  GROUP BY A.ACCTID, L.STATE, A.YEAR;
Db2 can use query rewrite in this case because of the following reasons:
  • The TRANS table is common to both UserQ1 and TRANSCNT.
  • Db2 can derive the columns of the query result from TRANSCNT.
  • The GROUP BY in the query requests data that are grouped at a higher level than the level in the definition of TRANSCNT.
Because customers typically make several hundred transactions per year with most of them in the same city, TRANSCNT is about hundred times smaller than TRANS. Therefore, rewriting UserQ1 into a query that uses TRANSCNT instead of TRANS improves response time significantly.
End general-use programming interface information.

Example 2

Begin general-use programming interface information. Assume that an analyst wants to find the number of televisions, with a price over 100 and a discount greater than 0.1, that were purchased by each credit card account. The analyst submits the following query:
UserQ2
------
SELECT T.ID, TI.QUANTITY * TI.PRICE * (1 - TI.DISCOUNT) AS AMT
  FROM TRANSITEM TI, TRANS T, PGROUP PG
  WHERE TI.TRANSID = T.ID  AND
        TI.PGID = PG.ID    AND
        TI.PRICE > 100     AND
        TI.DISCOUNT > 0.1  AND
        PG.NAME = 'TV';

If you define the following materialized query table TRANSIAB, Db2 can rewrite UserQ2 as NewQ2:

TRANSIAB
--------
CREATE TABLE TRANSIAB AS
(SELECT TI.TRANSID, TI.PRICE, TI.DISCOUNT, TI.PGID,
        L.COUNTRY, TI.PRICE * TI.QUANTITY as VALUE
   FROM TRANSITEM TI, TRANS T, LOC L
   WHERE TI.TRANSID = T.ID  AND
         T.LOCID = L.ID     AND
         TI.PRICE > 1       AND
         TI.DISCOUNT > 0.1)
   DATA INITIALLY DEFERRED
   REFRESH DEFERRED;


NewQ2
-----
SELECT A.TRANSID, A.VALUE * (1 - A.DISCOUNT) as AM
  FROM TRANSIAB A, PGROUP PG
  WHERE A.PGID = PG.ID     AND
        A.PRICE > 100      AND
        PG.NAME = 'TV';

Db2 can rewrite UserQ2 as a new query that uses materialized query table TRANSIAB because of the following reasons:

  • Although the predicate T.LOCID = L.ID appears only in the materialized query table, it does not result in rows that Db2 might discard. The referential constraint between the TRANS.LOCID and LOC.ID columns makes the join between TRANS and LOC in the materialized query table definition lossless. The join is lossless only if the foreign key in the constraint is NOT NULL.
  • The predicates TI.TRANSID = T.ID and TI.DISCOUNT > 0.1 appear in both the user query and the TRANSIAB fullselect.
  • The fullselect predicate TI.PRICE >1 in TRANSIAB subsumes the user query predicate TI.PRICE > 100 in UserQ2. Because the fullselect predicate is more inclusive than the user query predicate, Db2 can compute the user query predicate from TRANSIAB.
  • The user query predicate PG.NAME = 'TV' refers to a table that is not in the TRANSIAB fullselect. However, Db2 can compute the predicate from the PGROUP table. A predicate like PG.NAME = 'TV' does not disqualify other predicates in a query from qualifying for automatic query rewrite. In this case PGROUP is a relatively small dimension table, so a predicate that refers to the table is not overly costly.
  • Db2 can derive the query result from the materialized query table definition, even when the derivation is not readily apparent:
    • Db2 derives T.ID in the query from T.TRANSID in the TRANSIAB fullselect. Although these two columns originate from different tables, they are equivalent because of the predicate T.TRANSID = T.ID. Db2 recognizes such column equivalency through join predicates. Thus, Db2 derives T.ID from T.TRANSID, and the query qualifies for automatic query rewrite.
    • Db2 derives AMT in the query UserQ2 from DISCOUNT and VALUE in the TRANSIAB fullselect.
End general-use programming interface information.

Example 3

Begin general-use programming interface information. This example shows how Db2 matches GROUP BY items and aggregate functions between the user query and the materialized query table fullselect. Assume that an analyst submits the following query to find the average value of the transaction items for each year:

UserQ3
------
SELECT YEAR, AVG(QUANTITY * PRICE) AS AVGVAL
  FROM TRANSITEM TI, TRANS T
  WHERE TI.TRANSID = T.ID
  GROUP BY YEAR;

If you define the following materialized query table TRANSAVG, Db2 can rewrite UserQ3 as NewQ3:

TRANSAVG
--------
CREATE TABLE TRANSAVG AS
  (SELECT T.YEAR, T.MONTH, SUM(QUANTITY * PRICE) AS TOTVAL, COUNT(*) AS CNT
     FROM TRANSITEM TI, TRANS T
     WHERE TI.TRANSID = T.ID
     GROUP BY T.YEAR, T.MONTH )
     DATA INITIALLY DEFERRED
     REFRESH DEFERRED;


NewQ3
-----
SELECT YEAR, CASE WHEN SUM(CNT) = 0 THEN NULL
                  ELSE SUM(TOTVAL)/SUM(CNT)
                  END AS AVGVAL
  FROM TRANSAVG
  GROUP BY YEAR;
Db2 can rewrite UserQ3 as a new query that uses materialized query table TRANSAVG because of the following reasons:
  • Db2 considers YEAR in the user query and YEAR in the materialized query table fullselect to match exactly.
  • Db2 can derive the AVG function in the user query from the SUM function and the COUNT function in the materialized query table fullselect.
  • The GROUP BY clause in the query NewQ3 requests data at a higher level than the level in the definition of TRANSAVG.
  • Db2 can compute the yearly average in the user query by using the monthly sums and counts of transaction items in TRANSAVG. Db2 derives the yearly averages from the CNT and TOTVAL columns of the materialized query table by using a case expression.
End general-use programming interface information.