How clustering affects access path selection

Whether and how your data is clustered affects how Db2 chooses an access path. The value of the CLUSTERRATIOF column gives an indication of how closely the order of the index entries on the index leaf pages matches the actual ordering of the rows on the data pages.

In general, the closer that the value of the CLUSTERRATIOF column is to 100%, the more closely the ordering of the index entries matches the actual ordering of the rows on the data pages. The actual formula is quite complex and accounts for indexes with many duplicates; in general, for a given index, the more duplicates, the higher the CLUSTERRATIOF value.

Here are some things to remember about the effect of the CLUSTERRATIOF column on access paths:

  • CLUSTERRATIOF is an important input to the cost estimates that are used to determine whether an index is used for an access path, and, if so, which index to use.
  • If the access is INDEXONLY, then this value does not apply.
  • The higher the CLUSTERRATIOF value, the lower the cost of referencing data pages during an index scan is.
  • For an index that has a CLUSTERRATIOF less than 80%, sequential prefetch is not used to access the data pages.
  • A slight reduction in CLUSTERRATIOF for a table with a large number of rows can represent a much more significant number of unclustered rows than for a table with a small number of rows.For example, A CLUSTERRATIOF of 99% for a table with 100,000,000 rows represents 100,000 unclustered rows. Whereas, the CLUSTERRATIOF of 95% for a table with 100,000 rows represents 5000 unclustered rows.
  • For indexes that contain either many duplicate key values or key values that are highly clustered in reverse order, cost estimation that is based purely on CLUSTERRATIOF can lead to repetitive index scans. In the worst case, an entire page could be scanned one time for each row in the page. Db2 access path selection can avoid this performance problem by using a cost estimation formula based on the DATAREPEATFACTORF statistic to choose indexes.

The following figures below show the comparison between an index scan on an index with a high cluster ratio and an index with a lower cluster ratio.

Figure 1. A clustered index scan. This figure assumes that the index is 100% clustered.
Begin figure description. A four level index scan tree diagram shows root, intermediate, leaf, and data pages. Arrows that connect the leaf pages to data pages do not cross. End figure description.
Figure 2. A nonclustered index scan. In some cases, Db2 can access the data pages in order even when a nonclustered index is used.
Begin figure description. A four-level index scan tree diagram: root, intermediate, leaf, and data pages. Arrows that connect the leaf pages to data pages cross one another. End figure description.