Improving filter factors by collecting cardinality and frequency statistics
You can improve the filter factors that Db2 uses for access path selection by collecting cardinality and frequency statistics for columns and columns groups that are used in predicates.
Before you begin
- Identify missing or conflicting statistics.
- Consider identifying appropriate statistics to collect for your query workload by using a query optimization tool.
About this task
Db2 needs an accurate estimate of the number of rows that qualify each predicate is applied to determine optimal access paths. When multiple tables are accessed, filtering also affects the join order, the join method, and the cost of the join.
The SYSIBM.SYSCOLUMNS and SYSIBM.SYSCOLDIST catalog tables are the main source of statistics for calculating predicate filter factors.
- Cardinality statistics
- The COLCARDF column of the SYSCOLUMNS catalog table indicates the cardinality of a
column. A positive value is an estimate of the number of distinct values in the column. When no
statistics are collected for the column, the value is -1, and a default filter factor might be
used.
The value of the COLCARDF column that the RUNSTATS TABLESPACE utility generates is an estimate that is determined by a sampling method. If you know a more accurate value, you can supply it by updating the catalog. If the column is the first column of a non-DPSI index, the value that the RUNSTATS INDEX utility generates is exact.
- Frequency or distribution statistics
- Columns in the SYSCOLDIST catalog table contain frequency
statistics (sometimes also called distribution
statistics) for the values in a single column.
When frequency statistics do not exist, Db2 assumes that the data is uniformly distributed and all values in the column occur with the same frequency. This assumption can lead to an inaccurate estimate of the number of qualifying rows if the data is skewed, which can result in performance problems.
For example, assume that a column (AGE_CATEGORY) contains five distinct values (COLCARDF), each of which occur with the following frequencies:AGE_CATEGORY FREQUENCY ------------ --------- INFANT 5% CHILD 15% ADOLESCENT 25% ADULT 40% SENIOR 15%Without this frequency information, Db2 must use a default filter factor of 1/5 (or
1/COLCARDF), or 20%, to estimate the number of rows that qualify for predicate AGE_CATEGORY=ADULT. However, the actual frequency of that age category is 40%. Thus, the number of qualifying rows is underestimated by 50%.
Procedure
To improve predicate filter factors by collecting frequency and statistics, use any of the following approaches: