Pair-wise join access (JOIN_TYPE='P')
In pair-wise join processing Db2 matches each dimension to fact table independently, according to a single-column index for each dimension table.
RID lists from each join to the fact table index are intersected in pairs. The first pair of RID lists are intersected, the result of that intersection is paired with the next RID list and so on, until all viable dimension filtering is intersected. Pair-wise join applies only to queries that qualify for star schema processing. This access method requires separate indexes on the fact table that support each individual filtering dimension tables of the star schema. Pair-wise join requires a separate fact table index for each filtering dimension.
If Db2 does not consider pair-wise join to be a cost effective access path for processing the star schema query, Db2 might choose star join (JOIN_TYPE='S') or more-traditional join methods, such nested loop join, hybrid join, or merge-scan join.
For star join access to be considered, all columns in the index on the fact table must have one of the following data types:
- CHARACTER
- DATE
- DECIMAL
- DOUBLE
- INTEGER
- SMALLINT
- REAL
- TIME
- TIMESTAMP
- VARCHAR
Example:
Suppose that you have a store
in San Jose and want information about sales of audio equipment from
that store in 2005. For this example, you want to join the following
tables:
- A fact table for SALES (S)
- A dimension table for TIME (T) with columns for an ID, month, quarter, and year
- A dimension table for geographic LOCATION (L) with columns for an ID, city, region, and country
- A dimension table for PRODUCT (P) with columns for an ID, product item, class, and inventory
You can write the following query to join the tables:
SELECT *
FROM SALES S, TIME T, PRODUCT P, LOCATION L
WHERE S.TIME = T.ID
AND S.PRODUCT = P.ID
AND S.LOCATION = L.ID
AND T.YEAR = 2005
AND P.CLASS = 'AUDIO'
AND L.LOCATION = 'SAN JOSE';Instead of creating a single index on the fact table that supports the various combinations of filtering, you can create separate indexes on the fact table for each dimension table. For this example you would use the following three statements to create the indexes:
CREATE INDEX XSALES_T ON SALES (TIME);
CREATE INDEX XSALES_P ON SALES (PRODUCT);
CREATE INDEX XSALES_L ON SALES (LOCATION);
The following table shows the EXPLAIN output for this example.
| QBLOCKNO | METHOD | TNAME | JOIN
TYPE |
ACCESS
NAME |
ACCESS
TYPE |
MIXOPSEQ |
|---|---|---|---|---|---|---|
| 1 | 0 | SALES | P | P | 0 | |
| 1 | 0 | SALES | P | DSNPWJ(06) | MX | 1 |
| 1 | 0 | SALES | P | DSNPWJ(08) | MX | 2 |
| 1 | 0 | SALES | P | MI | 3 | |
| 1 | 0 | SALES | P | DSNPWJ(10) | MX | 4 |
| 1 | 0 | SALES | P | MI | 5 | |
| 1 | 0 | DSN_DIM_TBLX(02) | R | 6 | ||
| 1 | 1 | SALES | XSALES_T | I | 7 | |
| 1 | 0 | DSN_DIM_TBLX(03) | R | 8 | ||
| 1 | 1 | SALES | XSALES_P | I | 9 | |
| 1 | 0 | DSN_DIM_TBLX(04) | R | 10 | ||
| 1 | 1 | SALES | XSALES_L | I | 11 | |
| 1 | 1 | DSN_DIM_TBLX(02) | R | |||
| 1 | 1 | DSN_DIM_TBLX(03) | R | |||
| 1 | 1 | DSN_DIM_TBLX(04) | R | |||
| 2 | 0 | TIME | R | |||
| 3 | 0 | PRODUCT | R | |||
| 4 | 0 | LOCATION | R |
Dimension tables chosen for pair-wise join before the fact table must be materialized into their own dimension tables similar to snowflakes in star join access method. Query block 2, 3 and 4 build the work files for TIME, PRODUCT and LOCATION dimensions respectively. These dimensions are accessed before the fact table, and are also required to be joined back after the fact table, to ensure data consistency.
The sample EXPLAIN output above shows the following multi-index access steps:
- MIXOPSEQ = 0
- Access the data pages for the fact table
- MIXOPSEQ = 1
- ACCESSNAME='DSNPWJ(06)' represents the pair-wise RID list built from multi-index access steps 6 and 7
- MIXOPSEQ = 2
- ACCESSNAME='DSNPWJ(08)' represents the pair-wise RID list built from multi-index access steps 8 and 9
- MIXOPSEQ = 3
- Intersection of RID lists from steps 1 and 2
- MIXOPSEQ = 4
- ACCESSNAME='DSNPWJ(10)' represents the pair-wise RID list built from multi-index access steps 10 and 11
- MIXOPSEQ = 5
- Intersection of RID lists from steps 3 and 4
- MIXOPSEQ = 6
- Materialized dimension table, DSN_DIM_TBLX(02), built from query block 2
- MIXOPSEQ = 7
- Join from step 6 to the fact table index XSALES_T to build RID list as input to step 1
- MIXOPSEQ = 8
- Materialized dimension table DSN_DIM_TBLX(03) built from query block 3
- MIXOPSEQ = 9
- Join from step 6 to the fact table index XSALES_P to build RID list as input to step 2
- MIXOPSEQ = 10
- Materialized dimension table DSN_DIM_TBLX(04) built from query block 4
- MIXOPSEQ = 11
- Join from step 6 to the fact table index XSALES_L to build RID list as input to step 4
Based upon cost, Db2 might choose to access one or more dimensions before the fact table because each dimension has a corresponding fact table index.