Grouping set optimization

The optimizer uses all the previously mentioned grouping optimizations for individual grouping sets specified in the query.

If multiple temporary result sets are needed to implement all the grouping sets, they can all be populated using one pass through the data. This one-pass population occurs even if different types of temporary result sets are used to implement various grouping sets.

A temporary result type called sorted distinct list is used specifically for ROLLUP implementations. This temporary result set is used to compute the aggregate rows: the grouping set that includes all expressions listed in the ROLLUP clause. Hash grouping is used internally to quickly find the current grouping value. The entries in the temporary result sets are also sorted. This sorting allows the aggregate results to be used to compute the super-aggregate rows in the rollup result set without creating additional temporary result sets.

ROLLUPs can also be implemented using a radix index over the columns in the rollup without creating a temporary result set.

The optimizer can compute all the grouping sets in a given ROLLUP using at most one temporary result set. Therefore, it is advantageous for the optimizer to look for the rollup pattern in grouping set queries.

The optimizer tries to find the ROLLUP pattern in a list of individual grouping sets. For example, the following GROUP BY clause:

  GROUP BY GROUPING SETS ((A, B, C), (B, D), (A, B), (A), ())

is rewritten to:

  GROUP BY GROUPING SETS ((ROLLUP(A, B, C)), (B, D))

This rewrite allows the query to be implemented using at most two temporary results sets rather than 4.

Queries containing a CUBE clause is broken down into a union of ROLLUPs and grouping sets. For example:

  CUBE(A, B, C)

is equivalent to:

  (ROLLUP(A, B, C)),  (ROLLUP'(B, C)),  (ROLLUP'(C, A))

The ROLLUP' notation is an internal representation of a ROLLUP operation that does not include a grand total row in its result set. So, ROLLUP'(B, C) is equivalent to GROUP BY GROUPING SETS ((B,C), (B)). This CUBE rewrite implements at most three temporary result sets, rather than the 8 that might be needed had the query not been rewritten.