Star join access (JOIN_TYPE='S')
In star join processing, Db2 joins dimension tables to the fact table according to a multi-column index that is defined on the fact table.
Having a well-defined, multi-column index on the fact table is critical for efficient star join processing. Star join processing consolidates the dimensions to form a Cartesian product for matching to the single index on the fact table. Star join applies only to queries that qualify for star schema processing. This access method requires a single index on the fact table that supports the filtering that is provided by the dimension tables of the star schema.
If Db2 does not consider star join to be a cost effective access path for processing the star schema query, Db2 might choose pair-wise join (JOIN_TYPE='P'), or more-traditional join methods, such as nested loop, hybrid, or merge-scan.
For star join access to be considered, all columns in the index on the fact table must have one of the following data types:
- CHARACTER
- DATE
- DECIMAL
- DOUBLE
- INTEGER
- SMALLINT
- REAL
- TIME
- TIMESTAMP
- VARCHAR
Example: star schema with three dimension tables
Suppose that you have a store
in San Jose and want information about sales of audio equipment from
that store in 2005. For this example, you want to join the following
tables:
- A fact table for SALES (S)
- A dimension table for TIME (T) with columns for an ID, month, quarter, and year
- A dimension table for geographic LOCATION (L) with columns for an ID, city, region, and country
- A dimension table for PRODUCT (P) with columns for an ID, product item, class, and inventory
You could write the following query to join the tables:
SELECT *
FROM SALES S, TIME T, PRODUCT P, LOCATION L
WHERE S.TIME = T.ID AND
S.PRODUCT = P.ID AND
S.LOCATION = L.ID AND
T.YEAR = 2005 AND
P.CLASS = 'AUDIO' AND
L.LOCATION = 'SAN JOSE';You would use the following statement to create an index:
CREATE INDEX XSALES_TPL ON SALES (TIME, PRODUCT, LOCATION);The following table shows the EXPLAIN output for this example.
QBLOCKNO
|
METHOD
|
TNAME
|
JOIN
TYPE |
SORTN
JOIN |
ACCESS
TYPE |
PRIMARY
ACCESS TYPE |
|---|---|---|---|---|---|---|
| 1 | 0 | TIME | S | Y | R | |
| 1 | 1 | PRODUCT | S | Y | R | T |
| 1 | 1 | LOCATION | S | Y | R | T |
| 1 | 1 | SALES | S | I |
In the example above, all dimensions were accessed before the fact table. However, Db2 might choose to access only certain filtering dimensions before accessing the fact table, and remaining dimensions might be joined later. This is a cost-based decision that is made by optimizer.
Example: star schema with two snowflakes
All snowflakes are processed before the central part of a star join, as individual query blocks, and they are materialized into work files. A work file exists for each snowflake. The EXPLAIN output identifies these work files by naming them DSN_DIM_TBLX(nn), where nn indicates the corresponding QBLOCKNO for the snowflake.
- The PRODUCT table has a new column MID that represents the manufacturer.
- The MANUFACTURER table has columns for MID and name to contain manufacturer information.
- The LOCATION table has a new column CID that represents the country.
- The COUNTRY table has columns for CID and name to contain country information.
SELECT *
FROM SALES S, TIME T, PRODUCT P, MANUFACTURER M,
LOCATION L, COUNTRY C
WHERE S.TIME = T.ID AND
S.PRODUCT = P.ID AND
P.MID = M.MID AND
S.LOCATION = L.ID AND
L.CID = C.CID AND
T.YEAR = 2005 AND
M.NAME = 'some_company';QBLOCKNO
|
METHOD
|
TNAME
|
JOIN
TYPE |
SORTN
JOIN |
ACCESS
TYPE |
PRIMARY
ACCESS TYPE |
|---|---|---|---|---|---|---|
| 1 | 0 | TIME | S | Y | R | |
| 1 | 1 | DSN_DIM_TBLX(02) | S | Y | R | T |
| 1 | 1 | SALES | S | I | ||
| 1 | 1 | DSN_DIM_TBLX(03) | Y | R | T | |
| 2 | 0 | PRODUCT | R | |||
| 2 | 1 | MANUFACTURER | I | |||
| 3 | 0 | LOCATION | R | |||
| 3 | 4 | COUNTRY | I |
The sample explain output above shows that this query consists of several query blocks:
- QBLOCKNO=1
- The main star join block
- QBLOCKNO=2
- A snowflake (PRODUCT, MANUFACTURER) that is materialized into work file DSN_DIM_TBLX(02)
- QBLOCKNO=3
- A snowflake (LOCATION, COUNTRY) that is materialized into work file DSN_DIM_TBLX(03)
The joins in the snowflakes are processed first, and each snowflake is materialized into a work file. Therefore, when the main star join block (QBLOCKNO=1) is processed, it contains four tables: SALES (the fact table), TIME (a base dimension table), and the two snowflake work file tables.
In this example, in the
main star join block, the star join method is used for the first three
tables (as indicated by S in the JOIN TYPE column of the plan table).
The remaining work file is joined by the nested loop join with sparse
index access on the work file (as indicated by T in the PRIMARY_ACCESSTYPE
column for DSN_DIM_TBLX(3)).