Guidelines for collecting and updating statistics

The RUNSTATS utility collects statistics on tables, indexes, and statistical views to provide the optimizer with accurate information for access plan selection.

Use the RUNSTATS utility to collect statistics in the following situations:
  • After data is loaded into a table and appropriate indexes are created
  • After creating an index on a table
  • After a table is reorganized with the REORG utility
  • After a table and its indexes are significantly modified through update, insert, or delete operations
  • Before binding application programs whose performance is critical
  • When you want to compare current and previous statistics
  • When the prefetch value has changed
  • After executing the REDISTRIBUTE DATABASE PARTITION GROUP command
  • When you have XML columns. When RUNSTATS is used to collect statistics for XML columns only, existing statistics for non-XML columns that were collected during a load operation or a previous RUNSTATS operation are retained. If statistics on some XML columns were collected previously, those statistics are either replaced or dropped if the current RUNSTATS operation does not include those columns.

To improve RUNSTATS performance and save disk space used to store statistics, consider specifying only those columns for which data distribution statistics should be collected.

You should rebind application programs after executing RUNSTATS. The query optimizer might choose different access plans if new statistics are available.

If a full set of statistics cannot be collected at one time, use the RUNSTATS utility on subsets of the objects. If inconsistencies occur as a result of ongoing activity against those objects, a warning message (SQL0437W, reason code 6) is returned during query optimization. If this occurs, use RUNSTATS again to update the distribution statistics.

To ensure that index statistics are synchronized with the corresponding table, collect both table and index statistics at the same time. If a table was modified extensively since the last time that statistics were gathered, updating only the index statistics for that table leaves the two sets of statistics out of synchronization with each other.

Using the RUNSTATS utility on a production system might negatively affect workload performance. The utility now supports a throttling option that can be used to limit the performance impact of RUNSTATS execution during high levels of database activity.

When you collect statistics for a table in a partitioned database environment, RUNSTATS operates only on the database partition from which the utility is executed. The results from this database partition are extrapolated to the other database partitions. If this database partition does not contain a required portion of the table, the request is sent to the first database partition in the database partition group that contains the required data.

Statistics for a statistical view are collected on all database partitions containing base tables that are referenced by the view.

Consider the following tips to improve the efficiency of RUNSTATS and the usefulness of the statistics:
  • Collect statistics only for columns that are used to join tables or for columns that are referenced in the WHERE, GROUP BY, or similar clauses of queries. If the columns are indexed, you can specify these columns with the ONLY ON KEY COLUMNS clause on the RUNSTATS command.
  • Customize the values of the num_freqvalues and num_quantiles database configuration parameters for specific tables and columns.
  • When you create an index for a populated table, use the COLLECT STATISTICS clause to create statistics as the index is created.
  • When significant numbers of table rows are added or removed, or if data in columns for which you collect statistics is updated, use RUNSTATS again to update the statistics.
  • Because RUNSTATS collects statistics on only a single database partition, the statistics are less accurate if the data is not distributed consistently across all database partitions. If you suspect that there is skewed data distribution, consider redistributing the data across database partitions by using the REDISTRIBUTE DATABASE PARTITION GROUP command before using the RUNSTATS utility.
  • For Db2® V9.7 Fix Pack 1 and later releases, distribution statistics can be collected on an XML column. Distribution statistics are collected for each index over XML data specified on the XML column. By default, a maximum of 250 quantiles are used for distribution statistics for each index over XML data.

    When collecting distribution statistics on an XML column, you can change maximum number of quantiles. You can lower the maximum number of quantiles to reduce the space requirements for XML distribution statistics based on your particular data size, or you can increase the maximum number of quantiles if 250 quantiles are not sufficient to capture the distribution statistics of the data set for an index over XML data.