Block indexes and query performance for MDC tables

Scans on any of the block indexes of an MDC table provide clustered data access, because each block identifier (BID) corresponds to a set of sequential pages in the table that is guaranteed to contain data having the specified dimension value. Moreover, dimensions or slices can be accessed independently from each other through their block indexes without compromising the cluster factor of any other dimension or slice. This provides the multidimensionality of multidimensional clustering.

Queries that take advantage of block index access can benefit from a number of factors that improve performance.
  • Because block indexes are so much smaller than regular indexes, a block index scan is very efficient.
  • Prefetching of data pages does not rely on sequential detection when block indexes are used. The Db2® database manager looks ahead in the index, prefetching blocks of data into memory using big-block I/O, and ensuring that the scan does not incur I/O costs when data pages are accessed in the table.
  • The data in the table is clustered on sequential pages, optimizing I/O and localizing the result set to a selected portion of the table.
  • If a block-based buffer pool is used, and the block size is equal to the extent size, MDC blocks are prefetched from sequential pages on disk into sequential pages in memory, further increasing the positive effect of clustering on performance.
  • The records from each block are retrieved using a mini-relational scan of its data pages, which is often faster than scanning data through RID-based retrieval.

Queries can use block indexes to narrow down a portion of the table having a particular dimension value or range of values. This provides a fine-grained form of database partition elimination, that is, block elimination. This can translate into better concurrency for the table, because other queries, loads, inserts, updates and deletes may access other blocks in the table without interacting with this query's data set.

If the Sales table is clustered on three dimensions, the individual dimension block indexes can also be used to find the set of blocks containing records which satisfy a query on a subset of all of the dimensions of the table. If the table has dimensions of YearAndMonth, Region and Product, this can be thought of as a logical cube, as illustrated in Figure 1.

Figure 1. Multidimensional table with dimensions of 'Region', 'YearAndMonth', and 'Product'
Multidimensional table with Region, YearAndMonth, and Product dimensions

Four block indexes will be created for the MDC table shown in Figure 1: one for each of the individual dimensions, YearAndMonth, Region, and Product; and another with all of these dimension columns as its key. To retrieve all records having a Product equal to ProductA and Region equal to Northeast, the database manager would first search for the ProductA key from the Product dimension block index. (See Figure 2.) The database manager then determines the blocks containing all records having Region equal to Northeast, by looking up the Northeast key in the Region dimension block index. (See Figure 3.)

Figure 2. Key from dimension block index on 'Product'
Key from dimension block index on Product.
Figure 3. Key from dimension block index on 'Region'
Key from dimension block index on Region.

Block index scans can be combined through the use of the logical AND and logical OR operators and the resulting list of blocks to scan also provides clustered data access.

Using the previous example, in order to find the set of blocks containing all records having both dimension values, you have to find the intersection of the two slices. This is done by using the logical AND operation on the BID lists from the two block index keys. The common BID values are 11, 20, 26, 45, 54, 51, 53, and 56.

The following example illustrates how to use the logical OR operation with block indexes to satisfy a query having predicates that involve two dimensions. Figure 4 assumes an MDC table where the two dimensions are Colour and Nation. The goal is to retrieve all those records in the MDC table that meet the conditions of having Colour of blue or having a Nation name USA.

Figure 4. How the logical OR operation can be used with block indexes
Two predicates, Colour and Nation, are used with logical OR operation to create a group of BIDs to be scanned.

This diagram shows how the result of two separate block index scans are combined to determine the range of values that meet the predicate restrictions. (The numbers indicate record identifiers (RIDs), slot fields.)

Based on the predicates from the SELECT statement, two separate dimension block index scans are done; one for the blue slice, and another for the USA slice. A logical OR operation is done in memory in order to find the union of the two slices, and determine the combined set of blocks found in both slices (including the removal of duplicate blocks).

Once the database manager has list of blocks to scan, the database manager can do a mini-relational scan of each block. Prefetching of the blocks can be done, and will involve just one I/O per block, as each block is stored as an extent on disk and can be read into the buffer pool as a unit. If predicates need to be applied to the data, dimension predicates need only be applied to one record in the block, because all records in the block are guaranteed to have the same dimension key values. If other predicates are present, the database manager only needs to check these on the remaining records in the block.

MDC tables also support regular RID-based indexes. Both RID and block indexes can be combined using a logical AND operation, or a logical OR operation, with the index. Block indexes provide the optimizer with additional access plans to choose from, and do not prevent the use of traditional access plans (RID scans, joins, table scans, and others). Block index plans will be costed by the optimizer along with all other possible access plans for a particular query, and the most inexpensive plan will be chosen.

The Db2 Design Advisor can help to recommend RID-based indexes on MDC tables, or to recommend MDC dimensions for a table.