Optimization strategies for MDC tables

If you create multidimensional clustering (MDC) tables, the performance of many queries might improve, because the optimizer can apply additional optimization strategies. These strategies are primarily based on the improved efficiency of block indexes, but the advantage of clustering on more than one dimension also permits faster data retrieval.

MDC table optimization strategies can also exploit the performance advantages of intrapartition parallelism and interpartition parallelism. Consider the following specific advantages of MDC tables:
  • Dimension block index lookups can identify the required portions of the table and quickly scan only the required blocks.
  • Because block indexes are smaller than record identifier (RID) indexes, lookups are faster.
  • Index ANDing and ORing can be performed at the block level and combined with RIDs.
  • Data is guaranteed to be clustered on extents, which makes retrieval faster.
  • Rows can be deleted faster when rollout can be used.
Consider the following simple example for an MDC table named SALES with dimensions defined on the REGION and MONTH columns:
   select * from sales
     where month = 'March' and region = 'SE'
For this query, the optimizer can perform a dimension block index lookup to find blocks in which the month of March and the SE region occur. Then it can scan only those blocks to quickly fetch the result set.

Rollout deletion

When conditions permit delete using rollout, this more efficient way to delete rows from MDC tables is used. The required conditions are:
  • The DELETE statement is a searched DELETE, not a positioned DELETE (the statement does not use the WHERE CURRENT OF clause).
  • There is no WHERE clause (all rows are to be deleted), or the only conditions in the WHERE clause apply to dimensions.
  • The table is not defined with the DATA CAPTURE CHANGES clause.
  • The table is not the parent in a referential integrity relationship.
  • The table does not have ON DELETE triggers defined.
  • The table is not used in any MQTs that are refreshed immediately.
  • A cascaded delete operation might qualify for rollout if its foreign key is a subset of the table's dimension columns.
  • The DELETE statement cannot appear in a SELECT statement executing against the temporary table that identifies the set of affected rows prior to a triggering SQL operation (specified by the OLD TABLE AS clause on the CREATE TRIGGER statement).

During a rollout deletion, the deleted records are not logged. Instead, the pages that contain the records are made to look empty by reformatting parts of the pages. The changes to the reformatted parts are logged, but the records themselves are not logged.

The default behavior, immediate cleanup rollout, is to clean up RID indexes at delete time. This mode can also be specified by setting the DB2_MDC_ROLLOUT registry variable to IMMEDIATE, or by specifying IMMEDIATE on the SET CURRENT MDC ROLLOUT MODE statement. There is no change in the logging of index updates, compared to a standard delete operation, so the performance improvement depends on how many RID indexes there are. The fewer RID indexes, the better the improvement, as a percentage of the total time and log space.

An estimate of the amount of log space that is saved can be made with the following formula:
   S + 38*N - 50*P
where N is the number of records deleted, S is total size of the records deleted, including overhead such as null indicators and VARCHAR lengths, and P is the number of pages in the blocks that contain the deleted records. This figure is the reduction in actual log data. The savings on active log space required is double that value, due to the saving of space that was reserved for rollback.

Alternatively, you can have the RID indexes updated after the transaction commits, using deferred cleanup rollout. This mode can also be specified by setting the DB2_MDC_ROLLOUT registry variable to DEFER, or by specifying DEFERRED on the SET CURRENT MDC ROLLOUT MODE statement. In a deferred rollout, RID indexes are cleaned up asynchronously in the background after the delete commits. This method of rollout can result in significantly faster deletion times for very large deletes, or when a number of RID indexes exist on the table. The speed of the overall cleanup operation is increased, because during a deferred index cleanup, the indexes are cleaned up in parallel, whereas in an immediate index cleanup, each row in the index is cleaned up one by one. Moreover, the transactional log space requirement for the DELETE statement is significantly reduced, because the asynchronous index cleanup logs the index updates by index page instead of by index key.

Note: Deferred cleanup rollout requires additional memory resources, which are taken from the database heap. If the database manager is unable to allocate the memory structures it requires, the deferred cleanup rollout fails, and a message is written to the administration notification log.

When to use a deferred cleanup rollout

If delete performance is the most important factor, and there are RID indexes defined on the table, use deferred cleanup rollout. Note that prior to index cleanup, index-based scans of the rolled-out blocks suffer a small performance penalty, depending on the amount of rolled-out data. The following issues should also be considered when deciding between immediate index cleanup and deferred index cleanup:
  • Size of the delete operation

    Choose deferred cleanup rollout for very large deletions. In cases where dimensional DELETE statements are frequently issued on many small MDC tables, the overhead to asynchronously clean index objects might outweigh the benefit of time saved during the delete operation.

  • Number and type of indexes

    If the table contains a number of RID indexes, which require row-level processing, use deferred cleanup rollout.

  • Block availability

    If you want the block space freed by the delete operation to be available immediately after the DELETE statement commits, use immediate cleanup rollout.

  • Log space

    If log space is limited, use deferred cleanup rollout for large deletions.

  • Memory constraints

    Deferred cleanup rollout consumes additional database heap space on all tables that have deferred cleanup pending.

To disable rollout behavior during deletions, set the DB2_MDC_ROLLOUT registry variable to OFF or specify NONE on the SET CURRENT MDC ROLLOUT MODE statement.

Note: In Db2 Version 9.7 and later releases, deferred cleanup rollout is not supported on a data partitioned MDC table with partitioned RID indexes. Only the NONE and IMMEDIATE modes are supported. The cleanup rollout type will be IMMEDIATE if the DB2_MDC_ROLLOUT registry variable is set to DEFER, or if the CURRENT MDC ROLLOUT MODE special register is set to DEFERRED to override the DB2_MDC_ROLLOUT setting.

If only nonpartitioned RID indexes exist on the MDC table, deferred index cleanup rollout is supported.