Minimizing RUNSTATS impact
There are several approaches available to improve RUNSTATS performance.
To minimize the performance impact of this utility:
- Limit the columns for which statistics are collected by using the COLUMNS clause. Many columns are never referenced by predicates in the query workload, so they do not require statistics.
- Limit the columns for which distribution statistics are collected if the data tends to be uniformly distributed. Collecting distribution statistics requires more CPU and memory than collecting basic column statistics. However, determining whether the values for a column are uniformly distributed requires either having existing statistics or querying the data. This approach also assumes that the data remains uniformly distributed as the table is modified.
- Limit
the number of pages and rows processed by using page- or row-level
table sampling (by specifying the TABLESAMPLE SYSTEM or TABLESAMPLE
BERNOULLI clause) and by using page- or row-level index
sampling (by specifying INDEXSAMPLE SYSTEM or INDEXSAMPLE
BERNOULLI clause). Start with a 10% page-level sample,
by specifying
TABLESAMPLE SYSTEM(10)
andINDEXSAMPLE SYSTEM(10)
. Check the accuracy of the statistics and whether system performance has degraded due to changes in access plan. If it has degraded, try a 10% row-level sample instead, by specifyingTABLESAMPLE BERNOULLI(10)
. Likewise, experiment with the INDEXSAMPLE parameter to get the right rate for index sampling. If the accuracy of the statistics is insufficient, increase the sampling amount. When using RUNSTATS page- or row-level sampling, use the same sampling rate for tables that are joined. This is important to ensure that the join column statistics have the same level of accuracy. - Collect index statistics during index creation by specifying the COLLECT STATISTICS option on the CREATE INDEX statement. This approach is faster than performing a separate RUNSTATS operation after the index is created. It also ensures that the new index has statistics generated immediately after creation, to allow the optimizer to accurately estimate the cost of using the index.
- Collect statistics when executing the LOAD command with the REPLACE option. This approach is faster than performing a separate RUNSTATS operation after the load operation completes. It also ensures that the table has the most current statistics immediately after the data is loaded, to allow the optimizer to accurately estimate the cost of using the table.