Ensuring that queries fit the required criteria for the star schema join

The optimizer considers three specialized join methods for queries based on star schema: star join, Cartesian hub join, and zigzag join. These join methods can help to significantly improve performance for such queries.

A query must meet the following criteria to be recognized as a star schema for the purposes of a zigzag join, star join, or Cartesian hub join plan.
  • It must be a star-shaped query with one fact table and at least two dimension tables. If the query includes more than one fact table with common associated dimension tables (a multiple fact table query), the query optimizer will split the query into a query with multiple stars in it. The common dimension tables that join with more than one fact table are then used multiple times in the query. The explain output will show multiple zigzag join operators for these multiple fact table queries.
  • The dimension tables must have a primary key, a unique constraint, or a unique index defined on them; the primary key can be a composite key. If the dimension tables do not have a primary key, a unique constraint, or a unique index, then an older star detection method is used to detect a star for Cartesian hub and star join methods. In that case, the Cartesian hub and star join must meet the criteria described in Alternative Cartesian hub join and star join criteria.
  • The dimension tables and the fact table must be joined using equijoin predicates on all columns that are part of the primary keys for the dimension tables.
  • For Cartesian hub joins and zigzag joins, there must be a multicolumn index on the fact table; columns that participate in the join are part of that index, which must have enough join columns from the fact table that at least two dimension tables are covered.
  • For Cartesian hub joins and star index ANDing joins, a dimension table or a snowflake must filter the fact table. (Filtering is based on the optimizer's estimates.) There are also cases where a star join will still occur if the dimension table is joined with the fact table not as a filter, but as a simple look-up type of join.
For example, suppose that there are three dimension tables D1, D2, and D3. Dimension table D1 has primary key A, and it joins with the fact table on column A; dimension table D2 has primary key (B,C), and it joins with the fact table on columns B and C; finally, dimension table D3 has primary key D, and it joins with the fact table on column D. Supported index usage is as follows:
  • Any one of the following indexes would suffice, because each of these indexes covers at least two dimension tables: (A,D), (A,B,C), or (C,B,D).
  • Index (A,B,C,D) is also suitable, because it covers three dimension tables.
  • Index (A,B) cannot be used, because it does not completely cover dimension table D2.
  • Index (B,A,C) cannot be used, because columns B and C, which join with the primary key of D2, do not appear in contiguous positions in the index.
A dimension table cannot participate in any of these joins methods (zigzag join, star join, or Cartesian hub join) if any of the following occurs:
  • Sampling
  • Non-deterministic functions
  • Functions with side effects
Additionally, a zigzag join within a snowflake that causes duplication of keys from the dimension table is not supported.

Dimension tables that do not have these characteristics will be considered for inclusion in a zigzag join or star join by the optimizer using cost analysis; dimension tables that do have one or more of these characteristics can be joined using other join methods.

Nested zigzag joins are not supported; this means that a zigzag join plan may not be a dimension/snowflake to another zigzag join plan. Similarly, nested star joins are not supported; this means that a star join plan may not be a dimension/snowflake to another star join plan. In addition, zigzag join plans and star join plans cannot be nested inside one another.

Federated nicknames are excluded from the zigzag joins and star joins. Nicknames can be joined using other join methods. If the entire star is pushed down to a remote Db2® data server, the remote data server can use the zigzag join or star join method to execute the query.

Alternative Cartesian hub join and star join criteria

When there is no primary key, unique constraint, or unique index defined on the dimension tables then to be recognized as a star schema for the purposes of a Cartesian hub join or star join, a query must meet the following criteria:
  • For each query block
    • At least three different tables must be joined
    • All join predicates must be equality predicates
    • No subqueries can exist
    • No correlations or dependencies can exist between tables or outside of the query block
    • A fact table
      • Is the largest table in the query block
      • Has at least 10 000 rows
      • Is a base table
      • Must be joined to at least two dimension tables or two groups called snowflakes
    • A dimension table
      • Is not the fact table
      • Can be joined individually to the fact table or in snowflakes
    • A dimension table or a snowflake
      • Must filter the fact table. (Filtering is based on the optimizer's estimates.)
      • Must have a join predicate to the fact table that uses a column in a fact table index. This criterion must be met in order for either star join or hub join to be considered, although a hub join will only need to use a single fact table index.

A query block representing a left or right outer join can reference only two tables, so a star-schema join does not qualify.

Explicitly declaring referential integrity is not required for the optimizer to recognize a star-schema join.