Star join access (JOIN_TYPE='S')

In star join processing, Db2 joins dimension tables to the fact table according to a multi-column index that is defined on the fact table.

Having a well-defined, multi-column index on the fact table is critical for efficient star join processing. Star join processing consolidates the dimensions to form a Cartesian product for matching to the single index on the fact table. Star join applies only to queries that qualify for star schema processing. This access method requires a single index on the fact table that supports the filtering that is provided by the dimension tables of the star schema.

If Db2 does not consider star join to be a cost effective access path for processing the star schema query, Db2 might choose pair-wise join (JOIN_TYPE='P'), or more-traditional join methods, such as nested loop, hybrid, or merge-scan.

For star join access to be considered, all columns in the index on the fact table must have one of the following data types:

  • CHARACTER
  • DATE
  • DECIMAL
  • DOUBLE
  • INTEGER
  • SMALLINT
  • REAL
  • TIME
  • TIMESTAMP
  • VARCHAR

Example: star schema with three dimension tables

Begin program-specific programming interface information. Suppose that you have a store in San Jose and want information about sales of audio equipment from that store in 2005. For this example, you want to join the following tables:

  • A fact table for SALES (S)
  • A dimension table for TIME (T) with columns for an ID, month, quarter, and year
  • A dimension table for geographic LOCATION (L) with columns for an ID, city, region, and country
  • A dimension table for PRODUCT (P) with columns for an ID, product item, class, and inventory

You could write the following query to join the tables:

SELECT * 
  FROM SALES S, TIME T, PRODUCT P, LOCATION L
  WHERE S.TIME = T.ID      AND
        S.PRODUCT = P.ID   AND
        S.LOCATION = L.ID  AND
        T.YEAR = 2005      AND
        P.CLASS = 'AUDIO'  AND
        L.LOCATION = 'SAN JOSE';

You would use the following statement to create an index:

CREATE INDEX XSALES_TPL ON SALES (TIME, PRODUCT, LOCATION);

The following table shows the EXPLAIN output for this example.

Table 1. Plan table output for a star join example with TIME, PRODUCT, and LOCATION
QBLOCKNO
METHOD
TNAME
JOIN
TYPE
SORTN
JOIN
ACCESS
TYPE
PRIMARY
ACCESS
TYPE
1 0 TIME S Y R  
1 1 PRODUCT S Y R T
1 1 LOCATION S Y R T
1 1 SALES S   I  

In the example above, all dimensions were accessed before the fact table. However, Db2 might choose to access only certain filtering dimensions before accessing the fact table, and remaining dimensions might be joined later. This is a cost-based decision that is made by optimizer.

Example: star schema with two snowflakes

All snowflakes are processed before the central part of a star join, as individual query blocks, and they are materialized into work files. A work file exists for each snowflake. The EXPLAIN output identifies these work files by naming them DSN_DIM_TBLX(nn), where nn indicates the corresponding QBLOCKNO for the snowflake.

Suppose that two new tables MANUFACTURER (M) and COUNTRY (C) are added to the tables in the previous example to break dimension tables PRODUCT (P) and LOCATION (L) into snowflakes:
  • The PRODUCT table has a new column MID that represents the manufacturer.
  • The MANUFACTURER table has columns for MID and name to contain manufacturer information.
  • The LOCATION table has a new column CID that represents the country.
  • The COUNTRY table has columns for CID and name to contain country information.
You could write the following query to join all the tables:
SELECT *
  FROM SALES S, TIME T, PRODUCT P, MANUFACTURER M, 
       LOCATION L, COUNTRY C
  WHERE S.TIME = T.ID      AND
        S.PRODUCT = P.ID   AND
        P.MID = M.MID      AND
        S.LOCATION = L.ID  AND
        L.CID = C.CID      AND
        T.YEAR = 2005      AND
        M.NAME = 'some_company';
The joined table pairs (PRODUCT, MANUFACTURER and LOCATION, COUNTRY) are snowflakes. The following table shows the EXPLAIN output for this example.
Table 2. Plan table output for a star join example with snowflakes
QBLOCKNO
METHOD
TNAME
JOIN
TYPE
SORTN
JOIN
ACCESS
TYPE
PRIMARY
ACCESS
TYPE
1 0 TIME S Y R  
1 1 DSN_DIM_TBLX(02) S Y R T
1 1 SALES S   I  
1 1 DSN_DIM_TBLX(03)   Y R T
2 0 PRODUCT     R  
2 1 MANUFACTURER     I  
3 0 LOCATION     R  
3 4 COUNTRY     I  

The sample explain output above shows that this query consists of several query blocks:

QBLOCKNO=1
The main star join block
QBLOCKNO=2
A snowflake (PRODUCT, MANUFACTURER) that is materialized into work file DSN_DIM_TBLX(02)
QBLOCKNO=3
A snowflake (LOCATION, COUNTRY) that is materialized into work file DSN_DIM_TBLX(03)

The joins in the snowflakes are processed first, and each snowflake is materialized into a work file. Therefore, when the main star join block (QBLOCKNO=1) is processed, it contains four tables: SALES (the fact table), TIME (a base dimension table), and the two snowflake work file tables.

In this example, in the main star join block, the star join method is used for the first three tables (as indicated by S in the JOIN TYPE column of the plan table). The remaining work file is joined by the nested loop join with sparse index access on the work file (as indicated by T in the PRIMARY_ACCESSTYPE column for DSN_DIM_TBLX(3)). End program-specific programming interface information.