You can collect two kinds of data distribution statistics: frequent-value statistics and quantile statistics.
To specify the number of "sections" (quantiles) into which the column data values should be grouped, set the num_quantiles database configuration parameter to a value between 2 and 32 767. The default value is 20, which ensures a maximum optimizer estimation error of plus or minus 2.5% for any equality, less-than, or greater-than predicate, and a maximum error of plus or minus 5% for any BETWEEN predicate. To disable the collection of quantile statistics, set num_quantiles to 0 or 1.
You can set num_quantiles for a specific table, statistical view, or column.
Distribution statistics are most useful for dynamic and static queries that do not use host variables. The optimizer makes limited use of distribution statistics when assessing queries that contain host variables.
where c1 = key;
where c1 in (key1, key2, key3);
where (c1 = key1) or (c1 = key2) or (c1 = key3);
where c1 <= key;
where c1 between key1 and key2;
0.0
5.1
6.3
7.1
8.2
8.4
8.5
9.1
93.6
100.0
Quantile statistics help the optimizer to deal with
this kind of data distribution. select c1, count(*) as occurrences
from t1
group by c1
order by occurrences desc
Data Value | Frequency |
---|---|
20 | 5 |
30 | 10 |
40 | 10 |
50 | 25 |
60 | 25 |
70 | 20 |
80 | 5 |
Use the num_quantiles and num_freqvalues database configuration parameters to specify the precision with which distribution statistics are stored. You can also specify the precision with corresponding RUNSTATS command options when you collect statistics for a table or for columns. The higher you set these values, the greater the precision that the runstats utility uses when it creates and updates distribution statistics. However, greater precision requires more resources, both during the runstats operation itself, and for storing more data in the catalog tables.
For most databases, specify between 10 and 100 as the value of the num_freqvalues database configuration parameter. Ideally, frequent-value statistics should be created in such a way that the frequencies of the remaining values are either approximately equal to one another or negligible when compared to the frequencies of the most frequent values. The database manager might collect fewer than this number, because these statistics will only be collected for data values that occur more than once. If you need to collect only quantile statistics, set the value of num_freqvalues to zero.
For example, 25 quantiles should result in a maximum estimate error of 4% for BETWEEN predicates and 2% for ">" predicates. In general, specify at least 10 quantiles. More than 50 quantiles should be necessary only for extremely nonuniform data. If you need only frequent-value statistics, set num_quantiles to 0. If you set this parameter to 1, because the entire range of values fits within one quantile, no quantile statistics are collected.