Collecting histogram statistics

You can enable improved access path selection by collecting histogram statistics.

About this task

Db2 uses histogram statistics to estimate predicate selectivity from value-distribution statistics that are collected over the entire range of values in a data set.

The recommendation is to collect histogram statistics for columns that are specified in predicates, ORDER BY, GROUP BY, and HAVING clauses of SQL statements. Histogram statistics are most helpful for range, LIKE, and BETWEEN predicates, but they can also improve selectivity for equality, IS NULL, and IN predicates, and predicates that compare two columns.

The collection of histogram requires a sort operation. In the event that FREQVAL statistics are also collected, the same sort operation is used. For indexes with columns of mixed order, statistics can be collected only for the prefix columns of the same order.

Procedure

To collect histogram statistics:

  • Specify the HISTOGRAM option when you invoke one of the following utilities:
    • RUNSTATS
    • LOAD with the REPLACE, STATISTICS, and INDEX options.
    • REORG INDEX with the inline STATISTICS option.
    • REORG TABLESPACE with the inline STATISTICS option.
    • REBUILD INDEX
    Important: Histogram statistics that you collect through RUNSTATS are not the same as histogram statistics that you collect through other methods. Histogram statistics that you collect with LOAD, REORG, or REBUILD are only rough estimates. To obtain more exact statistics, use RUNSTATS.
  • Optional: Specify an integer value for the NUMQUANTILES option to set a guideline for number of intervals that Db2 uses for the histogram statistics.
    The value must be greater than or equal to one, and less than or equal to 100.

    The recommendation is to not specify the value of NUMQUANTILES in most cases. When the NUMQUANTILES value is not specified, Db2 uses as many as 100 intervals. However, Db2 might collect a smaller number of intervals, which is optimized for the number of rows and the number of distinct values in the table, and other factors.

    You might want to specify a specific value for NUMQUANTILES for certain applications situations that are well understood. For example, if queries frequently specify range intervals such as 0-10%, 10-20%, 20-30%, and so on, then a NUMQUANTILES value of 10 might be more appropriate.

    Even when you specify a value for NUMQUANTILES, the exact number of resulting intervals that Db2 creates is likely to vary from the value that you specify, depending on the number of rows, the number of distinct values, and other factors.