Filter factors for all distributions

RUNSTATS can generate additional statistics for a column or a group of columns. Db2 can use that information to calculate filter factors.

Begin program-specific programming interface information. Db2 collects two kinds of distribution statistics:

Frequency statistics
The percentage of rows in the table that contain a value for a column or set of columns
Cardinality statistics
The number of distinct values in a set of columns

The table that follows lists the types of predicates on which these statistics are used.

Table 1. Distribution statistics used, by predicate type
Predicate Single-column frequency Single-column cardinality Column-group frequency Column-group cardinality
COL=constant Yes Yes Yes Yes
COL IS NULL Yes Yes No No
COL IN (constant-list) Yes Yes No No
COL op constant (where op is one of these operators: <, <=, >, >=.) Yes Yes No No
COL BETWEEN constant AND constant Yes Yes No No
COL=host-variable Yes Yes No Yes
COL1=COL2 Yes Yes No Yes
T1.COL=T2.COL Yes Yes No No
COL IS NOT DISTINCT FROM Yes Yes Yes Yes

How Db2 uses frequency statistics

When you collect statistics with the RUNSTATS utility, or with inline statistics in another utility, you can include or omit FREQVAL option, in the correl-spec, with the FREQVAL option in the colgroup-spec, or in both, with the following results:

  • If you omit the FREQVAL keyword, Db2 collects distribution statistics for 10 most frequent values for the first column of the specified index.
  • When you specify the FREQVAL keyword, the COUNT keyword specifies the number of frequent values to collect. Start of changeIf you specify FREQVAL and omit the COUNT keyword, Db2 automatically determines the number of frequent values to collect, so that the data distribution is no longer skewed.End of change
  • If you specify the FREQVAL keyword in the correl-spec, RUNSTATS inserts rows for concatenated columns of an index. The NUMCOLS option specifies the number of concatenated index columns.
  • If you specify the FREQVAL keyword in the colgroup-spec, RUNSTATS inserts rows for the columns in the column group that you specify.
  • If you specify the FREQVAL, Db2 inserts rows for columns of the specified index and for columns in a column group.

In the SYSCOLDIST catalog table, the COLVALUE and FREQUENCYF columns contain the distribution statistics. Regardless of the number of values in those columns, collecting statistics again deletes the existing values and inserts new rows. Db2 uses the frequencies in column FREQUENCYF for predicates that use the values in column COLVALUE and assumes that the remaining data are uniformly distributed.

Example: Filter factor for a single column

Suppose that the predicate is C1 IN ('3','5') and that SYSCOLDIST contains these values for column C1:

  COLVALUE    FREQUENCYF
    '3'         .0153
    '5'         .0859
    '8'         .0627

The filter factor is .0153 + .0859 = .1012.

Example: Filter factor for correlated columns

Suppose that columns C1 and C2 are correlated. Suppose also that the predicate is C1='3' AND C2='5' and that SYSCOLDIST contains these values for columns C1 and C2:

COLVALUE    FREQUENCYF
'1' '1'       .1176
'2' '2'       .0588
'3' '3'       .0588
'3' '5'       .1176
'4' '4'       .0588
'5' '3'       .1764
'5' '5'       .3529
'6' '6'       .0588

The filter factor is .1176. End program-specific programming interface information.