Minimizing RUNSTATS impact

There are several approaches available to improve RUNSTATS performance.

To minimize the performance impact of this utility:
  • Limit the columns for which statistics are collected by using the COLUMNS clause. Many columns are never referenced by predicates in the query workload, so they do not require statistics.
  • Limit the columns for which distribution statistics are collected if the data tends to be uniformly distributed. Collecting distribution statistics requires more CPU and memory than collecting basic column statistics. However, determining whether the values for a column are uniformly distributed requires either having existing statistics or querying the data. This approach also assumes that the data remains uniformly distributed as the table is modified.
  • Limit the number of pages and rows processed by using page- or row-level table sampling (by specifying the TABLESAMPLE SYSTEM or TABLESAMPLE BERNOULLI clause) and by using page- or row-level index sampling (by specifying INDEXSAMPLE SYSTEM or INDEXSAMPLE BERNOULLI clause). Start with a 10% page-level sample, by specifying TABLESAMPLE SYSTEM(10) and INDEXSAMPLE SYSTEM(10). Check the accuracy of the statistics and whether system performance has degraded due to changes in access plan. If it has degraded, try a 10% row-level sample instead, by specifying TABLESAMPLE BERNOULLI(10). Likewise, experiment with the INDEXSAMPLE parameter to get the right rate for index sampling. If the accuracy of the statistics is insufficient, increase the sampling amount. When using RUNSTATS page- or row-level sampling, use the same sampling rate for tables that are joined. This is important to ensure that the join column statistics have the same level of accuracy.
  • Collect index statistics during index creation by specifying the COLLECT STATISTICS option on the CREATE INDEX statement. This approach is faster than performing a separate RUNSTATS operation after the index is created. It also ensures that the new index has statistics generated immediately after creation, to allow the optimizer to accurately estimate the cost of using the index.
  • Collect statistics when executing the LOAD command with the REPLACE option. This approach is faster than performing a separate RUNSTATS operation after the load operation completes. It also ensures that the table has the most current statistics immediately after the data is loaded, to allow the optimizer to accurately estimate the cost of using the table.
In a partitioned database environment, the RUNSTATS utility collects statistics from a single database partition. If the RUNSTATS command is issued on a database partition on which the table resides, statistics are collected there. If not, statistics are collected on the first database partition in the database partition group for the table. For consistent statistics, ensure that statistics for joined tables are collected from the same database partition.

Statistics collection may be configured to run automatically in the background. This collection is throttled to allow for user application queries and commands to be prioritized for execution. As part of automatic statistics collection, table cardinality models may be built alongside statistics collection. Training of these cardinality models are also throttled. While the time for statistics collection for tables is generally determined by the size of the tables, training of the models is more so dependent on the number of columns and number of distinct values in these columns. These generally take under a minute to train and, in contrast to statistics collection, may be relatively small for large tables and relatively large for small tables. There is no user control to tweak the time for training models. If there is a RUNSTATS profile using the TABLESAMPLE option or if AUTO-SAMPLING is configured, it will not influence the model training time.