The optimizer uses various methods to select an optimal join strategy for a query. Among these methods, which are determined by the optimization class of the query, are several search strategies, star-schema joins, early out joins, and composite tables.
The tables that are referenced in a query are almost always related by join predicates. If two tables are joined without a join predicate, the Cartesian product of the two tables is formed. In a Cartesian product, every qualifying row of the first table is joined with every qualifying row of the second table. This creates a result table that is usually very large, because its size is the cross product of the size of the two source tables. Because such a plan is unlikely to perform well, the optimizer avoids even determining the cost of this type of access plan.
The only exceptions occur when the optimization class is set to 9, or in the special case of star schemas. A star schema contains a central table called the fact table, and other tables called dimension tables. The dimension tables have only a single join that attaches them to the fact table, regardless of the query. Each dimension table contains additional values that expand information about a particular column in the fact table. A typical query consists of multiple local predicates that reference values in the dimension tables and contains join predicates connecting the dimension tables to the fact table. For these queries, it might be beneficial to compute the Cartesian product of multiple small dimension tables before accessing the large fact table. This technique is useful when multiple join predicates match a multicolumn index.
The DB2 data server can recognize queries against databases that were designed with star schemas having at least two dimension tables, and can increase the search space to include possible plans that compute the Cartesian product of dimension tables. A zigzag join is considered before the Cartesian product is even materialized. It probes the fact table using a multicolumn index, so that the fact table is filtered along two or more dimension tables simultaneously. When a zigzag join is used, it returns the next combination of values that is available from the fact table index. This next combination of values, known as feedback, is used to skip over probe values provided by the Cartesian product of dimension tables that will not find a match in the fact table. Filtering the fact table on two or more dimension tables simultaneously, and skipping probes that are known to be unproductive, together makes the zigzag join an efficient method for querying large fact tables.
This technique does not require multicolumn indexes. Explicit referential integrity constraints between the fact table and the dimension tables are not required, but are recommended.
The dynamic bitmaps that are created and used by star-schema join techniques require sort heap memory, the size of which is specified by the sortheap database configuration parameter.
The optimizer might select an early out join when it detects that each row from one of the tables only needs to be joined with at most one row from the other table.
select employee.name as employee_name,
manager.name as manager_name
from employee as employee, employee as manager
where employee.manager_id = manager.id
Assuming that
the ID column is a key in the EMPLOYEE table and that every employee
has at most one manager, this join avoids having to search for a subsequent
matching row in the MANAGER table. select distinct make.name
from make, model
where
make.make_id = model.make_id and
model.price > 30000
For each car maker, you only
need to determine whether any one of its manufactured models sells
for more than $30000. Joining a car maker with all of its manufactured
models selling for more than $30000 is unnecessary, because it does
not contribute towards the accuracy of the query result. select dailystockdata.symbol, max(dailystockdata.date) as date
from sp500, dailystockdata
where
sp500.symbol = dailystockdata.symbol and
dailystockdata.date < '01/01/2000' and
dailystockdata.close / dailystockdata.open >= 1.1
group by dailystockdata.symbol
The qualifying
set is the set of rows from the DAILYSTOCKDATA table that satisfies
the date and price requirements and joins with a particular stock
symbol from the SP500 table. If the qualifying set from the DAILYSTOCKDATA
table (for each stock symbol row from the SP500 table) is ordered
as descending on DATE, it is only necessary to return the first row
from the qualifying set for each symbol, because that first row represents
the most recent date for a particular symbol. The other rows in the
qualifying set are not required. select count(*)
from t1, t2, t3, t4
where
t1.a = t2.a and
t3.a = t4.a and
t2.z = t3.z
It might be beneficial to join table T1 and T2 (T1xT2), then join T3 and T4 (T3xT4), and finally, to select the first join result as the outer table and the second join result as the inner table. In the final plan ( (T1xT2) x (T3xT4) ), the join result (T3xT4) is known as a composite inner join. Depending on the query optimization class, the optimizer places different constraints on the maximum number of tables that can be the inner table of a join. Composite inner joins are allowed with optimization classes 5, 7, and 9.