Collecting accurate catalog statistics, including advanced statistics features

Accurate database statistics are critical for query optimization. Perform RUNSTATS command operations regularly on any tables that are critical to query performance.

You might also want to collect statistics on system catalog tables, if an application queries these tables directly and if there is significant catalog update activity, such as that resulting from the execution of data definition language (DDL) statements. Automatic statistics collection can be enabled to allow the Db2® data server to automatically perform a RUNSTATS command operation. Real-time statistics collection can be enabled to allow the Db2 data server to provide even more timely statistics by collecting them immediately before queries are optimized.

If you are collecting statistics manually by using the RUNSTATS command, use the following options at a minimum:
RUNSTATS ON TABLE DB2USER.DAILY_SALES
  WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL
Distribution statistics make the optimizer aware of data skew. Detailed index statistics provide more details about the I/O required to fetch data pages when the table is accessed by using a particular index. Collecting detailed index statistics uses considerable processing time and memory for large tables. The SAMPLED option provides detailed index statistics with nearly the same accuracy but requires a fraction of the CPU and memory. These options are used by automatic statistics collection when a statistical profile is not provided for a table.

To improve query performance, consider collecting more advanced statistics, such as column group statistics or LIKE statistics, or creating statistical views.

Statistical views are helpful when gathering statistics for complex relationships. Gathering statistics for statistical views can be automated through the automatic statistics collection feature in Db2. Enabling or disabling the automatic statistic collection of statistical views is done by using the auto_stats_views database configuration parameter. To enable this function, issue the following command:
update db cfg for dbname using auto_stats_views on
To disable this feature, issue the following command:
update db cfg for dbname using auto_stats_views off
This database configuration parameter is off by default. The command that is issued to automatically collect statistics on statistical views is equivalent to the following command:
runstats on view view_name with distribution
Collecting statistics for a large table or statistical view can be time consuming. Statistics of the same quality can often be collected by considering just a small sample of the overall data. Consider enabling automatic sampling for all background statistic collections; this may reduce the statistic collection time. To enable this function, issue the following command:
update db cfg for dbname using auto_sampling on

Collected statistics are not always exact. In addition to providing more efficient data access, an index can help provide more accurate statistics for columns which are often used in query predicates. When statistics are collected for a table and its indexes, index objects can provide accurate statistics for the leading index columns.

Enabling automatic collection of column group statistics

The optimizer uses column group statistics to account for statistical correlation when estimating the combined selectivity of multiple predicates and when computing the number of distinct groupings for operations that group data such as GROUP BY or DISTINCT.  Gathering column group statistics can be automated through the automatic statistics collection feature in Db2. Enabling or disabling the automatic collection of column group statistics is done by using the auto_cg_stats database configuration parameter. To enable this function, issue the following command:
update db cfg for dbname using auto_cg_stats on

The automatic collection of column group statistics will generate a profile describing the statistics that need to be collected. If a user profile does not exist, the background statistics collection will initially perform an automatic discovery of pair-wise column group statistics within the table and set a statistics profile. After the discovery is completed, statistics are gathered on the table using the existing statistics profile feature. The set of column groups discovered is preserved across subsequent discoveries.

If a statistics profile is already manually set, it will be used as is and the discovery is not performed. The automatically generated statistics profile can be used together with any PROFILE option of the RUNSTATS command. If the profile is updated using the UPDATE PROFILE option, any further discovery is blocked on the table, but the set of column group statistics already set in the profile will continue to be collected automatically as well as with a manual RUNSTATS that includes the USE PROFILE option.

The UNSET PROFILE command can be used to remove the statistics profile to restart the discovery process.

To disable this feature, issue the following command:
update db cfg for dbname using auto_cg_stats off

Disabling this feature will prevent any further discovery, but the statistic profiles will persist and will continue to be used.  If there is a need to remove the profile, use the UNSET PROFILE option of RUNSTATS.