Minimizing RUNSTATS impact
There are several approaches available to improve RUNSTATS performance.
- Limit the columns for which statistics are collected by using the COLUMNS clause. Many columns are never referenced by predicates in the query workload, so they do not require statistics.
- Limit the columns for which distribution statistics are collected if the data tends to be uniformly distributed. Collecting distribution statistics requires more CPU and memory than collecting basic column statistics. However, determining whether the values for a column are uniformly distributed requires either having existing statistics or querying the data. This approach also assumes that the data remains uniformly distributed as the table is modified.
- Limit
the number of pages and rows processed by using page- or row-level
table sampling (by specifying the TABLESAMPLE SYSTEM or TABLESAMPLE
BERNOULLI clause) and by using page- or row-level index
sampling (by specifying INDEXSAMPLE SYSTEM or INDEXSAMPLE
BERNOULLI clause). Start with a 10% page-level sample,
by specifying
TABLESAMPLE SYSTEM(10)
andINDEXSAMPLE SYSTEM(10)
. Check the accuracy of the statistics and whether system performance has degraded due to changes in access plan. If it has degraded, try a 10% row-level sample instead, by specifyingTABLESAMPLE BERNOULLI(10)
. Likewise, experiment with the INDEXSAMPLE parameter to get the right rate for index sampling. If the accuracy of the statistics is insufficient, increase the sampling amount. When using RUNSTATS page- or row-level sampling, use the same sampling rate for tables that are joined. This is important to ensure that the join column statistics have the same level of accuracy. - Collect index statistics during index creation by specifying the COLLECT STATISTICS option on the CREATE INDEX statement. This approach is faster than performing a separate RUNSTATS operation after the index is created. It also ensures that the new index has statistics generated immediately after creation, to allow the optimizer to accurately estimate the cost of using the index.
- Collect statistics when executing the LOAD command with the REPLACE option. This approach is faster than performing a separate RUNSTATS operation after the load operation completes. It also ensures that the table has the most current statistics immediately after the data is loaded, to allow the optimizer to accurately estimate the cost of using the table.
Statistics collection may be configured to run automatically in the background. This collection
is throttled to allow for user application queries and commands to be prioritized for execution. As
part of automatic statistics collection, table cardinality models may be built alongside statistics
collection. Training of these cardinality models are also throttled. While the time for statistics
collection for tables is generally determined by the size of the tables, training of the models is
more so dependent on the number of columns and number of distinct values in these columns. These
generally take under a minute to train and, in contrast to statistics collection, may be relatively
small for large tables and relatively large for small tables. There is no user control to tweak the
time for training models. If there is a RUNSTATS profile using the
TABLESAMPLE
option or if AUTO-SAMPLING is configured, it will
not influence the model training time.