Automatic statistics collection

When the statistics manager prepares its responses to the optimizer, it tracks the responses that were generated using default filter factors. Default filter factors are used when column statistics or indexes are not available. The statistics manager uses this information to automatically generate a statistic collection request for the columns. This request occurs while the access plan is written to the plan cache. If system resources allow, statistics collections occur in real time for direct use by the current query, avoiding a default answer to the optimizer.

Otherwise, as system resources become available, the requested column statistics are collected in the background. The next time the query is executed, the missing column statistics are available to the statistics manager. This process allows the statistics manager to provide more accurate information to the optimizer at that time. More statistics make it easier for the optimizer to generate a better performing access plan.

If a query is canceled before or during execution, the requests for column statistics are still processed. These requests occur if the execution reaches the point where the generated access plan is written to the Plan Cache.

To minimize the number of passes through a table during statistics collection, the statistics manger groups multiple requests for the same table. For example, two queries are executed against table T1. The first query has selection criteria on column C1 and the second over column C2. If no statistics are available for the table, the statistics manager identifies both of these columns as good candidates for column statistics. When the statistics manager reviews requests, it looks for multiple requests for the same table and groups them into one request. This grouping allows both column statistics to be created with only one pass through table T1.

One thing to note is that column statistics are usually automatically created when the statistics manager must answer questions from the optimizer using default filter factors. However, when an index is available that might be used to generate the answer, then column statistics are not automatically generated. In this scenario, there might be cases where optimization time benefits from column statistics. Using column statistics to answer questions from the optimizer is more efficient than using the index data. So if query performance seems extended, you might want to verify that there are indexes over the relevant columns in your query. If so, try manually generating column statistics for these columns.

As stated before, statistics collection occurs as system resources become available. If you have a low priority job permanently active on your system that is supposed to use all spare CPU cycles for processing, your statistics collection is never active.