Question & Answer
Are RUNSTATS needed on VOLATILE tables?
The short answer is yes, runstats is needed even for VOLATILE tables. However, this is something that is often misunderstood because many automated processes do not collect statistics against VOLATILE tables.
Understanding what VOLATILE means will help understand why collecting RUNSTATS on tables that have been marked as VOLATILE is important.
A VOLATILE table is one that is suppose to have widely fluctuating data and could, at any given moment, have no rows or may have millions of rows. Many times staging or temporary tables will fall into this category.
However, more than just staging and temporary tables are altered to be flagged as VOLATILE. This is because a VOLATILE table will disable LIST PREFETCH as well as tend to favor access plans that make use of indexes. Note, there is not a guarantee that an index plan will be used, just that they are more often favored by the optimizer.
A VOLATILE table will have certain adjustments made to its statistics. If there are no statistics, DB2 will first fabricate some statistics and then modify them slightly. If there are existing statistics, the VOLATILE attribute will ensure the statistics meet certain conditions, but will use the existing values as a basis for the adjustments that are made.
The adjusted statics are then used by the optimizer to formulate the access plan just like other tables. If it starts from a reasonably accurate estimation of the data in the table, the decisions the optimizer makes will be more accurate than if all attributes of the data are fabricated. Without reasonable statistics to start with, DB2 will not know if the predicates provided in the query will return 1 row or 1 million rows.
The reason why many automated statistics collection processes would skip a table marked as VOLATILE is simply that an automated process would not know if the data currently in the table is at the operational norm or if the current data is in a transitional state.
For temporary and/or staging tables, the tables may be empty twenty three hours out of the day and only populated just before the table is used. If an automated job ran during the wrong period of time, the resulting statistics would not be useful.
The recommendation would be to collect statistics on all tables when the table is at its normal operational volume. The options included on the RUNSTATS command would be the same as the options used for non-VOLATILE tables (i.e. include index, distribution, and column group stats as required).
For a temporary or staging table, this collection would be after it was populated and before it is used in queries. To achieve that, the statistics collection may need to occur in the job stream that populated and then used the table.
Tables that really are not "volatile" in nature, but have been marked as VOLATILE in order to influence the optimizer, could have their statistics collected during the normal maintenance windows, since the data is more "static" in nature and not subject to wide fluctuations.
If there are automated statistical collection processes in use, any truly "volatile" table altered to have the VOLATILE attribute or not, should be excluded from those processes to avoid collecting statistics during a transitional state.
16 June 2018