Index grouping implementation

There are two primary ways to implement grouping using an index: Ordered grouping and pre-summarized processing.

Ordered grouping

This implementation uses the Radix Index Scan or the Radix Index Probe access methods to perform the grouping. An index is required that contains all the grouping columns as contiguous leftmost key columns. The database manager accesses the individual groups through the index and performs the requested summary functions.

Since the index, by definition, already has all the key values grouped, the first group result can be returned in less time than the hashing method. This index performance is faster because the hashing method requires a temporary result. This implementation can be beneficial if an application does not need to retrieve all the group results, or if an index exists that matches the grouping columns.

When the grouping is implemented with an index and a permanent index does not exist that satisfies grouping columns, a temporary index is created. The grouping columns specified within the query are used as the key columns for this index.

Pre-summarized processing

This SQE-only implementation uses an Encoded Vector Index to extract the summary information already in the symbol table of the index. The EVI symbol table contains the unique key values and a count of the number of table records that have that unique value. The grouping for the columns of the index key is already performed. If the query references a single table and performs simple aggregation, the EVI might be used for quick access to the grouping results. For example, consider the following query:

SELECT COUNT(*), col1 
	FROM t1 
	GROUP BY col1

If an EVI exists over t1 with a key of col1, the optimizer can rewrite the query to access the precomputed grouping answer in the EVI symbol table.

This rewrite can result in dramatic improvements when the number of table records is large and the number of resulting groups is small, relative to the size of the table.

This method is also possible with selection (WHERE clause), as long as the reference columns are in the key definition of the EVI.

For example, consider the following query:

SELECT COUNT(*), col1
 FROM t1
 WHERE col1 > 100
 GROUP BY col1

This query can be rewritten by the optimizer to use the EVI. This pre-summarized processing works for DISTINCT processing, GROUP BY and for column function COUNT. All columns of the table referenced in the query must also be in the key definition of the EVI.

So, for example, the following query can be made to use the EVI:

SELECT DISTINCT col1
	FROM t1

However, this query cannot:

SELECT DISTINCT col1
 FROM t1
 WHERE col2 > 1

This query cannot use the EVI because it references col2 of the table, which is not in the key definition of the EVI. If multiple columns are defined in the EVI key, for example, col1 and col2, it is important to use the left-most columns of the key. For example, if an EVI existed with a key definition of (col1, col2), but the query referenced only col2, it is unlikely the EVI is used.

EVI INCLUDE aggregates

A more powerful example of pre-summarized processing can be facilitated by the use of the INCLUDE keyword on the index create. By default, COUNT(*) is implied on the creation of an EVI. Additional numeric aggregates specified over non-key data can further facilitate pre-determined or ready-made aggregate results during query optimization.

For example, suppose the following query is a frequently requested result set, queried in whole or as part of a subquery comparison.

SELECT  AVG(col2) 
FROM t1 
GROUP BY col1  

Create the following EVI to predetermine the value of AVG(col2).

CREATE ENCODED VECTOR INDEX  eviT1 ON t1(col1) INCLUDE(AVG(col2))

eviT1 delivers distinct values for col1 and COUNT(*) specific to the group by of col1. eviT1 can be used to generate an asynchronous bitmap or RRN list for accessing the table rows for specific col1 values. In addition, eviT1 computes an additional aggregate, AVG(col2), over the same group by column (col1) by specifying the INCLUDE aggregate.

INCLUDE aggregates are limited to those aggregates that result in numeric values: SUM, COUNT, AVG, STDDEV, and so on. These values can be readily maintained as records are inserted, deleted, or updated in the base table.

MIN or MAX are two aggregates that are not supported as INCLUDE aggregates. Deleting the current row contributing to the MIN or MAX value would result in the need to recalculate, potentially accessing many rows, and reducing performance.

INCLUDE values can also contain aggregates over derivations. For example, if you have a couple of columns that contribute to an aggregate, that derivation can be specified, for example, as SUM(col1+col2+col3).

It is recommended that EVIs with INCLUDE aggregates only contain references to columns or column-specific derivations, for example, SUM(salary+bonus).

In many environments, queries that contain derivations using constants convert those constants to parameter markers. This conversion allows a much higher degree of ODP reuse. However, it can be more difficult to match the parameter value to a literal in the index definition.

The optimizer does attempt to match constants in the EVI with parameter markers or host variable values in the query. However, in some complex cases this support is limited and could result in the EVI not matching the query.

Pre-summarized processing can also take advantage of EVIs with INCLUDE in a JOIN situation.

For example, see the following aggregate query over the join of two tables.

EVI INCLUDE aggregate example

SELECT deptname, sum(salary)
FROM DEPARTMENT, EMPLOYEE
WHERE deptno=workdept
GROUP BY deptname  

By providing an EVI with INCLUDE index, as follows, and with optimizer support to push down aggregates to the table level when possible, the resulting implementation takes advantage of the ready-made aggregates already supplied by EVI employeeSumByDept. The implementation never needs to touch or aggregate rows in the Employee table.

CREATE ENCODED VECTOR INDEX employeeSumByDept ON employee(workdept) 
INCLUDE(sum(salary))

Aggregate pushdown results in a rewrite with EVI INCLUDE implementation, conceptually like the following query.

SELECT deptname, sum(sum(salary))
FROM department, 
     (SELECT  workdept, sum(salary) FROM employee group by workdept) employee_2
WHERE deptno=workdept

Instead of department joining to all the rows in the employee table, it now has the opportunity to join to the predetermined aggregates, the sum of salary by department number, in the EVI symbol table. This results in significant reduction in processing and IO.