Filter factors for all distributions
RUNSTATS can generate additional statistics for a column or a group of columns. Db2 can use that information to calculate filter factors.
Db2 collects
two kinds of distribution statistics:
- Frequency statistics
- The percentage of rows in the table that contain a value for a column or set of columns
- Cardinality statistics
- The number of distinct values in a set of columns
The table that follows lists the types of predicates on which these statistics are used.
Predicate | Single-column frequency | Single-column cardinality | Column-group frequency | Column-group cardinality |
---|---|---|---|---|
COL=constant | Yes | Yes | Yes | Yes |
COL IS NULL | Yes | Yes | No | No |
COL IN (constant-list) | Yes | Yes | No | No |
COL op constant (where op is one of these operators: <, <=, >, >=.) | Yes | Yes | No | No |
COL BETWEEN constant AND constant | Yes | Yes | No | No |
COL=host-variable | Yes | Yes | No | Yes |
COL1=COL2 | Yes | Yes | No | Yes |
T1.COL=T2.COL | Yes | Yes | No | No |
COL IS NOT DISTINCT FROM | Yes | Yes | Yes | Yes |
How Db2 uses frequency statistics
When you collect statistics with the RUNSTATS utility, or with inline statistics in another utility, you can include or omit FREQVAL option, in the correl-spec, with the FREQVAL option in the colgroup-spec, or in both, with the following results:
- If you omit the FREQVAL keyword, Db2 collects distribution statistics for 10 most frequent values for the first column of the specified index.
- When you specify the FREQVAL keyword, the COUNT keyword specifies the number of frequent values
to collect.
If you specify FREQVAL and omit the COUNT keyword, Db2 automatically determines the number of frequent values to collect, so that the data distribution is no longer skewed.
- If you specify the FREQVAL keyword in the correl-spec, RUNSTATS inserts rows for concatenated columns of an index. The NUMCOLS option specifies the number of concatenated index columns.
- If you specify the FREQVAL keyword in the colgroup-spec, RUNSTATS inserts rows for the columns in the column group that you specify.
- If you specify the FREQVAL, Db2 inserts rows for columns of the specified index and for columns in a column group.
In the SYSCOLDIST catalog table, the COLVALUE and FREQUENCYF columns contain the distribution statistics. Regardless of the number of values in those columns, collecting statistics again deletes the existing values and inserts new rows. Db2 uses the frequencies in column FREQUENCYF for predicates that use the values in column COLVALUE and assumes that the remaining data are uniformly distributed.
Example: Filter factor for a single column
Suppose that the predicate is C1 IN ('3','5') and that SYSCOLDIST contains these values for column C1:
COLVALUE FREQUENCYF
'3' .0153
'5' .0859
'8' .0627
The filter factor is .0153 + .0859 = .1012.
Example: Filter factor for correlated columns
Suppose that columns C1 and C2 are correlated. Suppose also that the predicate is C1='3' AND C2='5' and that SYSCOLDIST contains these values for columns C1 and C2:
COLVALUE FREQUENCYF
'1' '1' .1176
'2' '2' .0588
'3' '3' .0588
'3' '5' .1176
'4' '4' .0588
'5' '3' .1764
'5' '5' .3529
'6' '6' .0588
The filter factor is .1176.