Manually collecting and refreshing statistics

Edit online

You can manually collect and refresh statistics using Schemas from within IBM® i Access Client Solutions (ACS) or by using statistics APIs.

To collect statistics using ACS, from Schemas right-click a table or alias and select Statistic Data. On the Statistic Data dialog, click New. Then select the columns that you want to collect statistics for. Once you have selected the columns, you can collect the statistics immediately or collect them in the background.

To refresh a statistic using ACS, right-click a table or alias and select Statistic Data. Click Update. Select the statistic that you want to refresh. You can collect the statistics immediately or collect them in the background.

There are several scenarios in which the manual management (create, remove, refresh, and so on) of column statistics could be beneficial and recommended.

High Availability (HA) solutions

High availability solutions replicate data to a secondary system by using journal entries. However, column statistics are not journaled. That means that, on your backup system, no column statistics are available when you first start using that system. To prevent the "warm up" effect, you might want to propagate the column statistics that were gathered on your production system. Recreate them on your backup system manually.

ISV (Independent Software Vendor) preparation

An ISV might want to deliver a customer solution that already includes column statistics frequently used in the application, rather than waiting for the automatic statistics collection to create them. Run the application on the development system for some time and examine which column statistics were created automatically. You can then generate a script file to execute on the customer system after the initial data load takes place. The script file can be shipped as part of the application

Business Intelligence environments

In a large Business Intelligence environment, it is common for large data load and update operations to occur overnight. Column statistics are marked stale only when they are touched by the statistics manager, and then refreshed after first touch. You might want to consider refreshing the column statistics manually after loading the data.

You can do this refresh easily by toggling the system value QDBFSTCCOL to *NONE and then back to *ALL. This process causes all stale column statistics to be refreshed. It also starts collection of any column statistics previously requested by the system but not yet available. Since this process relies on the access plans stored in the plan cache, avoid performing a system initial program load (IPL) before toggling QDBFSTCCOL. An IPL clears the plan cache.

This procedure works only if you do not delete (drop) the tables and recreate them in the process of loading your data. When deleting a table, access plans in the plan cache that refer to this table are deleted. Information about column statistics on that table is also lost. The process in this environment is either to add data to your tables or to clear the tables instead of deleting them.

Massive data updates

Updating rows in a column statistics-enabled table can significantly change the cardinality, add new ranges of values, or change the distribution of data values. These updates can affect query performance on the first query run against the new data. On the first run of such a query, the optimizer uses stale column statistics to determine the access plan. At that point, it starts a request to refresh the column statistics.

Prior to this data update, you might want to toggle the system value QDBFSTCCOL to *NONE and back to *ALL or *SYSTEM. This toggle causes an analysis of the plan cache. The analysis includes searching for column statistics used in access plan generation, analyzing them for staleness, and requesting updates for the stale statistics.

If you massively update or load data, and run queries against these tables at the same time, the automatic column statistics collection tries to refresh every time 15% of the data is changed. This processing can be redundant since you are still updating or loading the data. In this case, you might want to block automatic statistics collection for the tables and deblock it again after the data update or load finishes. An alternative is to turn off automatic statistics collection for the whole system before updating or loading the data. Switch it back on after the updating or loading has finished.

Backup and recovery

When thinking about backup and recovery strategies, keep in mind that creation of column statistics is not journaled. Column statistics that exist at the time a save operation occurs are saved as part of the table and restored with the table. Any column statistics created after the save took place are lost and cannot be recreated by using techniques such as applying journal entries. If you have a long interval between save operations and rely on journaling to restore your environment, consider tracking column statistics that are generated after the latest save operation.