Identifying missing or conflicting statistics

During access path selection, Db2 identifies statistics values that are missing or conflicting. Db2 externalizes data about the missing or conflicting statistics in certain catalog and EXPLAIN tables.

Before you begin

About this task

Db2 externalizes data about the missing or conflicting statistics to the following tables:
  • The DSN_STAT_FEEDBACK table, when you capture access path information in EXPLAIN tables. Information is captured in this table only when Db2 uses the access path selection process to capture the EXPLAIN information. The access path selection process is not used when you issue an EXPLAIN statement with the STMTCACHE or PACKAGE options.
  • The SYSIBM.SYSSTATFEEDBACK catalog table, at the specified statistics interval.

Each statistics feedback row indicates that a statistic that Db2 uses during access path selection was not available or was in conflict with other statistics. Because the statistics values are not checked, a statistics feedback row does not indicate certainty that collecting the statistic can change or improve the access path.

You can analyze the feedback data and collect the missing or conflicting statistics. Optimization tools can also use the optimization feedback data to recommend the collection of such statistics.

Important: Statistics can become stale and adversely affect query performance. In particular, histogram statistics can become stale quickly on columns with values that always increase or decrease. Query optimization tools might help you to evaluate the benefit of such statistics. Avoid the collection of any statistic as a one-time effort. Instead, add statistics collection to regular RUNSTATS jobs to avoid creating stale statistics that might harm performance. You can also use RUNSTATS profiles to integrate new statistics into your regular RUNSTATS jobs.
Tip: Start of changeThe SYSIBM.SYSSTATFEEDBACK table might contain rows that recommend collection of statistics for DEFINE NO objects, or empty table space partitions. However, statistics cannot be collected for such objects until after data is inserted and the underlying data sets are created. For more information, see Effect of utilities on objects that have the DEFINE NO attribute.End of change

Procedure

To apply statistics feedback:

  1. Use the values of the following columns of either table to identify the context of the statistics feedback:
    • The TBCREATOR and TBNAME column identify the table.
    • For index statistics, the IXCREATOR and IXNAME columns identify the index.
    • For column-level statistics, the COLNAME column contains the name of the column.
    • For multi-column statistics, the NUMCOLUMNS column identifies the number of columns that are involved and the COLGROUPCOLNO column identifies the columns of the column group.
    • The DSN_STAT_FEEDBACK table contains more columns, such as QUERYNO, EXPLAIN_TIME, and others, that you can use to correlate the data to a particular SQL statement.
  2. Optional: Use the REASON column value to understand the reason why each statistic was identified.
    For example:
    • REASON='BASIC' indicates that no statistics were collected for the identified object.
    • REASON='CONFLICT' indicates that related statistics contain conflicting values. Such conflicts usually occur because statistics were collected for related objects at different times. The solution is to adjust the statistics collection process so that related statistics are captured at the same time.
    • REASON='LOWCARD', 'NULLABLE', or 'DEFAULT' indicate that frequency statistics are not available for columns that seem likely to contain skewed data. Examples include columns with low cardinality values, nullable columns, and predicates on values that might be default values for the column.
  3. Use the TYPE column value to determine the type of statistic to collect.
    The following table shows the RUNSTATS or inline statistics utility options to use to collect statistics for each TYPE value.
    Table 1. RUNSTATS options for optimization feedback TYPE values.
    TYPE value Statistics to collect RUNSTATS utility options to use
    'T' Basic table statistics Use the TABLE option:
    RUNSTATS TABLESPACE ...
     	    TABLE(table-name)
    'I' Basic index statistics Use the INDEX option:
    RUNSTATS INDEX
    'C' Cardinality statistics For a single column cardinality, use the COLUMN option:
    RUNSTATS TABLESPACE ...
    	   TABLE(table-name)
    	   COLUMN(column-name)
    For a multi-column cardinality, use the COLGROUP option:
    RUNSTATS TABLESPACE ...
    	   TABLE(table-name)
    	   COLGROUP(column-name1,column-name2)
    'F' Frequency statistics Use the FREQVAL option:
    RUNSTATS.  For example,
    	RUNSTATS TABLESPACE ...
    	   TABLE(table-name)
    	   COLGROUP(column-name) FREQVAL COUNT integer
    'H' Histogram statistics Use the HISTOGRAM option:
    RUNSTATS TABLESPACE ...
    	   TABLE(table-name)
    	   COLGROUP(column-name) HISTOGRAM 

What to do next

Run RUNSTATS, or another utility with inline statistics, to collect the identified statistics.