DB2 Version 9.7 for Linux, UNIX, and Windows

RUNSTATS command

Updates statistics in the system catalog about the characteristics of a table and/or associated indexes, or statistical views. These characteristics include number of records, number of pages, and average record length. The optimizer uses these statistics when determining access paths to the data.

For a table, this utility should be called when the table has had many updates, or after reorganizing the table. For a statistical view, this utility should be called when changes to underlying tables have substantially affected the rows returned by the view. The view must have been previously enabled for use in query optimization using the ALTER VIEW statement.

Scope

This command can be issued from any database partition in the db2nodes.cfg file. It can be used to update the catalogs on the catalog database partition.

For tables, this command collects statistics for a table on the database partition from which it is invoked. If the table does not exist on that database partition, the first database partition in the database partition group is selected.

For views, this command collects statistics using data from tables on all participating database partitions.

Authorization

For tables, one of the following:

SYSADM
SYSCTRL
SYSMAINT
DBADM
SQLADM
CONTROL privilege on the table
LOAD authority

You do not need any explicit privilege to use this command on any declared temporary table that exists within its connection.

For statistical views, one of the following:

SYSADM
SYSCTRL
SYSMAINT
DBADM
SQLADM
CONTROL privilege on the statistical view

Required connection

Database

Command syntax


>>-RUNSTATS--ON TABLE--object-name------------------------------>

>--+-+------------------------+--+------------------------------------+-+-><
   | '-| Statistics Options |-'  '-UTIL_IMPACT_PRIORITY--+----------+-' |   
   |                                                     '-priority-'   |   
   +-USE PROFILE--------------------------------------------------------+   
   '-UNSET PROFILE------------------------------------------------------'   

Statistics Options

                                 .-ALLOW WRITE ACCESS-.   
|--+--------------------------+--+--------------------+--------->
   '-| Table Object Options |-'  '-ALLOW READ ACCESS--'   

>--+----------------------------+--+---------------------+------|
   '-| Table Sampling Options |-'  '-| Profile Options |-'   

Table Object Options

|--+-FOR--| Index Clause |--+-----------------------+----------------------------------+--|
   |                        '-EXCLUDING XML COLUMNS-'                                  |   
   '-+-------------------------+--+-----------------------+--+-----------------------+-'   
     '-| Column Stats Clause |-'  '-EXCLUDING XML COLUMNS-'  '-AND--| Index Clause |-'     

Table Sampling Options

|--TABLESAMPLE--+-BERNOULLI-+--(--numeric-literal--)------------>
                '-SYSTEM----'                          

>--+-----------------------------------+------------------------|
   '-REPEATABLE--(--integer-literal--)-'   

Profile Options

   .-SET PROFILE NONE--------------.   
|--+-------------------------------+----------------------------|
   '-+-SET----+--PROFILE--+------+-'   
     '-UPDATE-'           '-ONLY-'     

Index Clause

                                             .-,----------.     
                                             V            |     
|--+-----------------------+--+-INDEXES-+--+---index-name-+-+---|
   '-+---------+--DETAILED-'  '-INDEX---'  '-ALL------------'   
     '-SAMPLED-'                                                

Column Stats Clause

|--+-| On Cols Clause |------------------------------+----------|
   '-+--------------------+--| Distribution Clause |-'   
     '-| On Cols Clause |-'                              

On Cols Clause

   .-ON ALL COLUMNS-----------------------------------------------------.   
   |                                           .-,-----------------.    |   
   |                                           V                   |    |   
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |-+--)-+--|
   |     '-+-ALL-+--COLUMNS AND-'                                       |   
   |       '-KEY-'                                                      |   
   '-ON KEY COLUMNS-----------------------------------------------------'   

Distribution Clause

|--WITH DISTRIBUTION--+-------------------------+--------------->
                      '-| On Dist Cols Clause |-'   

>--+--------------------------+---------------------------------|
   '-| Default Dist Options |-'   

On Dist Cols Clause

   .-ON ALL COLUMNS-----------------------------------------------------------------------------------.   
   |                                           .-,-----------------------------------------------.    |   
   |                                           |                    .--------------------------. |    |   
   |                                           V                    V                          | |    |   
|--+-ON--+----------------------+--COLUMNS--(----| Column Option |----+----------------------+-+-+--)-+--|
   |     '-+-ALL-+--COLUMNS AND-'                                     +-| Frequency Option |-+        |   
   |       '-KEY-'                                                    '-| Quantile Option |--'        |   
   '-ON KEY COLUMNS-----------------------------------------------------------------------------------'   

Default Dist Option

            .--------------------------.   
            V                          |   
|--DEFAULT----+-| Frequency Option |-+-+------------------------|
              '-| Quantile Option |--'     

Frequency Option

|--NUM_FREQVALUES--integer--------------------------------------|

Quantile Option

|--NUM_QUANTILES--integer---------------------------------------|

Column Option

|--+-column-name--+-----------------+-+-------------------------|
   |              '-LIKE STATISTICS-' |   
   |    .-,-----------.               |   
   |    V             |               |   
   '-(----column-name-+--)------------'

Command parameters

object-name

Identifies the table or statistical view on which statistics are to be collected. It must not be a hierarchy table. For typed tables, object-name must be the name of the root table of the table hierarchy. The fully qualified name or alias in the form: schema.object-name must be used. The schema is the user name under which the table was created.

index-name

Identifies an existing index defined on the table. The fully qualified name in the form schema.index-name must be used. This option cannot be used for views.

USE PROFILE

This option allows RUNSTATS to employ a previously stored statistics profile to gather statistics for a table or statistical view. The statistics profile is created using the SET PROFILE options and is updated using the UPDATE PROFILE options.

UNSET PROFILE

Specify this option to remove an existing statistics profile. For example,

RUNSTATS ON tablemyschema.mytable UNSET PROFILE

FOR INDEXES

Collects and updates statistics for the indexes only. If no table statistics had been previously collected on the table, basic table statistics are also collected. Updates statistics for table cardinality (CARD), FPAGES, NPAGES, even when table statistics exists. These basic statistics do not include any distribution statistics. This option cannot be used for views. COLCARD of the leading column of the index might also be updated. End of change

AND INDEXES

Collects and updates statistics for both the table and the indexes. This option cannot be used for views.

DETAILED

Calculates extended index statistics. These are the CLUSTERFACTOR and PAGE_FETCH_PAIRS statistics that are gathered for relatively large indexes. This option cannot be used for views.

SAMPLED

This option, only used with the DETAILED option, directs RUNSTATS to use a sampling technique when compiling the extended index statistics. When the sampling option is not used, detailed index statistics collection can consume considerable CPU and memory for large tables. The SAMPLED option provides detailed index statistics with nearly the same accuracy but requires less CPU and memory.

ON ALL COLUMNS

To collect statistics on all eligible columns, use the ON ALL COLUMNS clause. Columns can be specified either for basic statistics collection (On Cols clause) or in conjunction with the WITH DISTRIBUTION clause (On Dist Cols clause). The ON ALL COLUMNS specification is the default option if neither of the column specific clauses are specified.

If it is specified in the On Cols clause, all columns will have only basic column statistics collected unless specific columns are chosen as part of the WITH DISTRIBUTION clause. Those columns specified as part of the WITH DISTRIBUTION clause will also have basic and distribution statistics collected.

If the WITH DISTRIBUTION ON ALL COLUMNS is specified both basic statistics and distribution statistics are collected for all eligible columns. Anything specified in the On Cols clause is redundant and therefore not necessary.

ON COLUMNS

This clause allows the user to specify a list of column options. The column options can be a list of single columns for which to collect statistics and/or one or more column groups. Column groups are specified by a parenthesized list of comma separated column names, within the parenthesized list of column options. If you specify a column group the number of distinct values for the group will be collected.

When you run RUNSTATS on a table without gathering index statistics, and specify a subset of columns for which statistics are to be gathered, then:

Statistics for columns not specified in the RUNSTATS command but which are the first column in an index are NOT reset.
Statistics for all other columns not specified in the RUNSTATS command are reset.

This clause can be used in the On Cols clause and the On Dist Cols clause. Collecting distribution statistics for a group of columns is not currently supported.

If XML type columns are specified in a column group, the XML type columns will be ignored for the purpose of collecting distinct values for the group. However, basic XML column statistics will be collected for the XML type columns in the column group.

EXCLUDING XML COLUMNS

This clause allows you to omit all XML type columns from statistics collection. This clause facilitates the collection of statistics on non-XML columns because the inclusion of XML data can require greater system resources. The EXCLUDING XML COLUMNS clause takes precedence over other clauses that specify XML columns for statistics collection. For example, if you use the EXCLUDING XML COLUMNS clause, and you also specify XML type columns with the ON COLUMNS clause or you use the ON ALL COLUMNS clause, all XML type columns will be ignored during statistics collection. For DB2® V9.7 Fix Pack 1 and later releases, distribution statistics over XML type columns are not collected when this clause is specified.

ON KEY COLUMNS

Instead of listing specific columns, you can choose to collect statistics on columns that make up all the indexes defined on the table. It is assumed here that critical columns in queries are also those used to create indexes on the table. If there are no indexes on the table, it is as good as an empty list and no column statistics will be collected. It can be used in the On Cols clause or the On Dist Cols clause. It is redundant in the On Cols clause if specified in both clauses since the WITH DISTRIBUTION clause is used to specify collection of both basic and distribution statistics. XML type columns are by definition not a key column and will not be included for statistics collection by the ON KEY COLUMNS clause. This option cannot be used for views.

column-name

Name of a column in the table or statistical view. If you specify the name of an ineligible column for statistics collection, such as a nonexistent column or a mistyped column name, error (-205) is returned. Two lists of columns can be specified, one without distribution and one with distribution. If the column is specified in the list that is not associated with the WITH DISTRIBUTION clause only basic column statistics will be collected. If the column appears in both lists, distribution statistics will be collected (unless NUM_FREQVALUES and NUM_QUANTILES are set to zero).

NUM_FREQVALUES

Defines the maximum number of frequency values to collect. It can be specified for an individual column in the ON COLUMNS clause. If the value is either not specified or is specified as '-1' for an individual column, the frequency limit value is picked up from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of frequency values to be collected will be what is set in the num_freqvalues database configuration parameter. End of change

NUM_QUANTILES

Defines the maximum number of distribution quantile values to collect. It can be specified for an individual column in the ON COLUMNS clause. If the value is either not specified or is specified as '-1' for an individual column, the quantile limit value is picked up from that specified in the DEFAULT clause. If it is not specified there either, the maximum number of quantile values to be collected will be what is set in the num_quantiles database configuration parameter.

For DB2 V9.7 Fix Pack 1 and later releases, distribution statistics for each index over XML data uses a maximum of 250 quantiles as the default. The default can be changed by specifying the NUM_QUANTILES parameter in the ON COLUMNS or the DEFAULT clause. The num_quantiles database configuration parameter is ignored while collecting XML distribution statistics.

WITH DISTRIBUTION

This clause specifies that both basic statistics and distribution statistics are to be collected on the columns. If the ON COLUMNS clause is not specified, distribution statistics are collected on all the columns of the table or statistical view (excluding columns that are ineligible such as CLOB and LONG VARCHAR). If the ON COLUMNS clause is specified, distribution statistics are collected only on the column list provided (excluding those ineligible for statistics collection). If the clause is not specified, only basic statistics are collected.

Collection of frequent values and distribution statistics on column groups is currently not supported; distribution statistics will not be collected when column groups are specified in the WITH DISTRIBUTION ON COLUMNS clause.

DEFAULT

If NUM_FREQVALUES or NUM_QUANTILES are specified, these values will be used to determine the maximum number of frequency and quantile statistics to be collected for the columns, if these are not specified for individual columns in the ON COLUMNS clause. If the DEFAULT clause is not specified, the values used will be those in the corresponding database configuration parameters.

LIKE STATISTICS

When this option is specified additional column statistics might be collected for columns of type CHAR and VARCHAR with a code page attribute of single-byte character set (SBCS), FOR BIT DATA, or UTF-8. The statistics are collected if the runstats utility determines that such statistics are appropriate after analyzing column values. These statistics are the SUB_COUNT and the SUB_DELIM_LENGTH statistics in SYSSTAT.COLUMNS. They are used by the query optimizer to improve the selectivity estimates for predicates of the type "column LIKE '%xyz'"and "column LIKE '%xyz%'".

ALLOW WRITE ACCESS

Specifies that other users can read from and write to the tables while statistics are calculated. For statistical views, these are the base tables referenced in the view definition.

The ALLOW WRITE ACCESS option is not recommended for tables that will have a lot of inserts, updates or deletes occurring concurrently. The RUNSTATS command first performs table statistics and then performs index statistics. Changes in the table's state between the time that the table and index statistics are collected might result in inconsistencies. Although having up-to-date statistics is important for the optimization of queries, it is also important to have consistent statistics. Therefore, statistics should be collected at a time when inserts, updates or deletes are at a minimum.

ALLOW READ ACCESS

Specifies that other users can have read-only access to the tables while statistics are calculated. For statistical views, these are the base tables referenced in the view definition.

TABLESAMPLE BERNOULLI

This option allows RUNSTATS to collect statistics on a sample of the rows from the table or statistical view. Bernoulli sampling considers each row individually, including that row with probability P/100 (where P is the value of numeric-literal) and excluding it with probability 1-P/100. Thus, if the numeric-literal were evaluated to be the value 10, representing a 10 percent sample, each row would be included with probability 0.1 and be excluded with probability 0.9. Unless the optional REPEATABLE clause is specified, each execution of RUNSTATS will usually yield a different such sample of the table. All data pages will be retrieved through a table scan but only the percentage of rows as specified through the numeric-literal parameter will be used for the statistics collection.

TABLESAMPLE SYSTEM

This option allows RUNSTATS to collect statistics on a sample of the data pages from the tables. System sampling considers each page individually, including that page with probability P/100 (where P is the value of numeric-literal) and excluding it with probability 1-P/100. Unless the optional REPEATABLE clause is specified, each execution of RUNSTATS will usually yield a different such sample of the table. The size of the sample is controlled by the numeric-literal parameter in parentheses, representing an approximate percentage P of the table to be returned. Only a percentage of the data pages as specified through the numeric-literal parameter will be retrieved and used for the statistics collection.

On statistical views, system sampling is restricted to views whose definitions are a select over a single base table. If the view contains multiple tables, SYSTEM sampling is also possible if:

the tables are joined using equality predicates on all the primary key and foreign key columns included in a referential integrity constraint defined between the tables,
no search condition filters rows in any parent tables in the relationship, and
a single child table, that is also not a parent table, can be identified among all the tables.

If the statistical view does not meet those conditions, Bernoulli sampling will be used instead and a warning will be returned (SQL2317W).

REPEATABLE (integer-literal)

Adding the REPEATABLE clause to the TABLESAMPLE clause ensures that repeated executions of RUNSTATS return the same sample. The integer-literal parameter is a non-negative integer representing the seed to be used in sampling. Passing a negative seed will result in an error (SQL1197N). The sample set might still vary between repeatable RUNSTATS invocations if activity against the table or statistical view resulted in changes to the table or statistical view data since the last time TABLESAMPLE REPEATABLE was run. Also, the method by which the sample was obtained as specified by the BERNOULLI or SYSTEM keyword, must also be the same to ensure consistent results.

numeric-literal

The numeric-literal parameter specifies the size of the sample to be obtained, as a percentage P. This value must be a positive number that is less than or equal to 100, and can be between 1 and 0. For example, a value of 0.01 represents one one-hundredth of a percent, such that 1 row in 10,000 would be sampled, on average. A value of 0 or 100 will be treated by the DB2 database system as if sampling was not specified, regardless of whether TABLESAMPLE BERNOULLI or TABLESAMPLE SYSTEM is specified. A value greater than 100 or less than 0 will be treated as an error (SQL1197N) by the DB2 database system.

SET PROFILE NONE

Specifies that no statistics profile will be set for this RUNSTATS invocation.

SET PROFILE

Allows RUNSTATS to generate and store a specific statistics profile in the system catalog tables and executes the RUNSTATS command options to gather statistics.

SET PROFILE ONLY

Allows RUNSTATS to generate and store a specific statistics profile in the system catalog tables without running the RUNSTATS command options.

UPDATE PROFILE

Allows RUNSTATS to modify an existing statistics profile in the system catalog tables, and runs the RUNSTATS command options of the updated statistics profile to gather statistics. You cannot use the UPDATE PROFILE option to remove clauses that are in a statistics profile.

UPDATE PROFILE ONLY

Allows RUNSTATS to modify an existing statistics profile in the system catalog tables without running the RUNSTATS command options of the updated statistics profile. You cannot use the UPDATE PROFILE ONLY option to remove clauses that are in a statistics profile.

UTIL_IMPACT_PRIORITY priority

Specifies that RUNSTATS will be throttled at the level specified by priority. priority is a number in the range of 1 to 100, with 100 representing the highest priority and 1 representing the lowest. The priority specifies the amount of throttling to which the utility is subjected. All utilities at the same priority undergo the same amount of throttling, and utilities at lower priorities are throttled more than those at higher priorities. If priority is not specified, the RUNSTATS will have the default priority of 50. Omitting the UTIL_IMPACT_PRIORITY keyword will invoke the RUNSTATS utility without throttling support. If the UTIL_IMPACT_PRIORITY keyword is specified, but the util_impact_lim configuration parameter is set to 100, then the utility will run unthrottled. This option cannot be used for views.

In a partitioned database, when used on tables, the RUNSTATS command collects the statistics on only a single database partition. If the database partition from which the RUNSTATS command is executed has a partition of the table, then the command executes on that database partition. Otherwise, the command executes on the first database partition in the database partition group across which the table is partitioned.

Examples

Collect statistics on the table only, on all columns without distribution statistics:
```
RUNSTATS ON TABLE db2user.employee
```
Collect statistics on the table only, on columns empid and empname with distribution statistics:
```
RUNSTATS ON TABLE db2user.employee 
    WITH DISTRIBUTION ON COLUMNS (empid, empname) 
```
Collect statistics on the table only, on all columns with distribution statistics using a specified number of frequency limit for the table while picking the num_quantiles from the configuration setting:
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT 
    NUM_FREQVALUES 50 
```
Collect statistics on the table only, on all columns with distribution statistics, and on column group JOB, WORKDEPT, and SEX:
```
RUNSTATS ON TABLE db2user.employee ON ALL COLUMNS AND COLUMNS ((JOB, WORKDEPT, SEX)) WITH DISTRIBUTION
```

Collect statistics on a set of indexes:

RUNSTATS ON TABLE db2user.employee for indexes 
    db2user.empl1, db2user.empl2

Collect basic statistics on all indexes only:

RUNSTATS ON TABLE db2user.employee FOR INDEXES ALL

Collect basic statistics on the table and all indexes using sampling for the detailed index statistics collection:
```
RUNSTATS ON TABLE db2user.employee AND SAMPLED DETAILED INDEXES ALL
```
Collect statistics on table, with distribution statistics on columns empid, empname and empdept and the two indexes Xempid and Xempname. Distribution statistics limits are set individually for empdept, while the other two columns use a common default:
```
RUNSTATS ON TABLE db2user.employee 
  WITH DISTRIBUTION ON COLUMNS (empid, empname, empdept NUM_FREQVALUES 
          50 NUM_QUANTILES 100)  
          DEFAULT NUM_FREQVALUES 5 NUM_QUANTILES 10
          AND INDEXES db2user.Xempid, db2user.Xempname
```
Collect statistics on all columns used in indexes and on all indexes:
```
RUNSTATS ON TABLE db2user.employee ON KEY COLUMNS AND INDEXES ALL
```

Collect statistics on all indexes and all columns without distribution except for one column. Consider T1 containing columns c1, c2, ...., c8

RUNSTATS ON TABLE db2user.T1 
  WITH DISTRIBUTION ON COLUMNS (c1, c2, c3 NUM_FREQVALUES 20 
  NUM_QUANTILES 40, c4, c5, c6, c7, c8)
  DEFAULT NUM_FREQVALUES 0, NUM_QUANTILES 0 AND INDEXES ALL 
                                                 
RUNSTATS ON TABLE db2user.T1 
  WITH DISTRIBUTION ON COLUMNS (c3 NUM_FREQVALUES 20 NUM_QUANTILES 40)
  AND INDEXES ALL

Collect statistics on table T1 for the individual columns c1 and c5 as well as on the column groups (c2, c3) and (c2, c4). Multicolumn cardinality is very useful to the query optimizer when it estimates filter factors for predicates on columns in which the data is correlated.
```
RUNSTATS ON TABLE db2user.T1 ON COLUMNS (c1, (c2, c3),
    (c2, c4), c5)
```
Collect statistics on table T1 for the individual columns c1 and c2. For column c1 also collect the LIKE predicate statistics.
```
RUNSTATS ON TABLE db2user.T1 ON COLUMNS (c1 LIKE STATISTICS, c2)
```
Register a statistics profile to collect statistics on the table only, on all columns with distribution statistics using a specified number of frequency limit for the table while picking the num_quantiles from the configuration setting. The command also updates the statistics as specified.
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT 
   NUM_FREQVALUES 50 SET PROFILE
```
Register a statistics profile to collect statistics on the table only, on all columns with distribution statistics using a specified number of frequency limit for the table while picking the num_quantiles from the configuration setting. Statistics are not collected.
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 50 SET PROFILE ONLY
```
Modify the previously registered statistics profile by changing the NUM_FREQVALUES value from 50 to 30. The command also updates the statistics as specified.
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE
```
Modify the previously registered statistics profile by changing the NUM_FREQVALUES value from 50 to 30. Statistics are not collected.
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE ONLY
```
Modify the previously registered statistics profile by adding column empl_address and column group (empl_title, empl_salary) options. The command also updates the statistics as specified.
```
RUNSTATS ON TABLE db2user.employee 
 ON COLUMNS (empl_address, (empl_title, empl_salary))
 UPDATE PROFILE
```
Modify the previously registered statistics profile by adding column empl_address and column group (empl_title, empl_salary) options. Statistics are not collected.
```
RUNSTATS ON TABLE db2user.employee 
 ON COLUMNS (empl_address, (empl_title, empl_salary))
 UPDATE PROFILE ONLY
```
Collect statistics on a table using the options recorded in the statistics profile for that table:
```
RUNSTATS ON TABLE db2user.employee USE PROFILE
```
Query the RUNSTATS command options corresponding to the previously registered statistics profile stored in the catalogs of the table:
```
SELECT STATISTICS_PROFILE FROM SYSCAT.TABLES WHERE TABNAME = 
    'EMPLOYEE'
```

Collect statistics, including distribution statistics, on 30 percent of the rows:

RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION 
    TABLESAMPLE BERNOULLI(30)

To control the sample set on which statistics will be collected and to be able to repeatedly use the same sample set, you can do so as follows:
```
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION 
    TABLESAMPLE BERNOULLI(30) REPEATABLE(4196)
```
Issuing the same statement as above will result in the same set of statistics as long as the data has not changed in the interim.
Collect index statistics as well as table statistics on 1.5 percent of the data pages. Only table data pages and not index pages are sampled. In this example 1.5 percent of table data pages are used for the collection of table statistics, while for index statistics all the index pages will be used:
```
RUNSTATS ON TABLE db2user.employee AND INDEXES ALL TABLESAMPLE SYSTEM(1.5)
```
Collect statistics for a statistical view, on all columns, without distribution statistics:
```
RUNSTATS ON TABLE salesdb.product_sales_view
```
Collect statistics for a statistical view, with distribution statistics on the columns category, type and product_key. Distribution statistics limits are set for the category column, while the other columns use a common default:
```
RUNSTATS ON TABLE salesdb.product_sales_view
 WITH DISTRIBUTION ON COLUMNS (category NUM_FREQVALUES 100 NUM_QUANTILES 100,
		 type, product_key) DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
```
Collect statistics, including distribution statistics, on 10 percent of the rows using row level sampling:
```
RUNSTATS ON TABLE db2user.daily_sales
 WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)
```
Collect statistics, including distribution statistics, on 2.5 percent of the rows using data page level sampling. Additionally, specify the repeated use of the same sample set. For this command to succeed, the query must be such that the DB2 database system can successfully push data page sampling down to one or more tables. Otherwise, an error (SQL 20288N) is issued.
```
RUNSTATS ON TABLE db2user.daily_sales
 WITH DISTRIBUTION TABLESAMPLE SYSTEM (2.5)
```
Register a statistics profile to collect statistics on the view and on all columns with distribution statistics as specified:
```
RUNSTATS ON TABLE salesdb.product_sales_view
 WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
 SET PROFILE
```

Modify the previously registered statistics profile. This command also updates the statistics as specified:

RUNSTATS ON TABLE salesdb.product_sales_view
 WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 25 NUM_QUANTILES 25
 UPDATE PROFILE

Usage notes

When there are detached partitions on a partitioned table, index keys that still belong to detached data partitions which require cleanup will not be counted as part of the keys in the statistics. These keys are not counted because they are invisible and no longer part of the table. They will eventually get removed from the index by asynchronous index cleanup. As a result, statistics collected before asynchronous index cleanup is run will be misleading. If the RUNSTATS command is issued before asynchronous index cleanup completes, it will likely generate a false alarm for index reorganization or index cleanup based on the inaccurate statistics. Once asynchronous index cleanup is run, all the index keys that still belong to detached data partitions which require cleanup will be removed and this may eliminate the need for index reorganization.
For partitioned tables, you are encouraged to issue the RUNSTATS command after an asynchronous index cleanup has completed in order to generate accurate index statistics in the presence of detached data partitions. To determine whether or not there are detached data partitions in the table, you can check the status field in the SYSCAT.DATAPARTITIONS catalog view and look for the value L (logically detached), I (index cleanup), or D (detached with dependent MQT).

The RUNSTATS command collects statistics for all index partitions of a partitioned index. Statistics in the SYSTAT.INDEXES view for the partitioned index represent an index partition, except for FIRSTKEYCARD, FIRST2KEYCARD, FIRST3KEYCARD, FIRST4KEYCARD, and FULLKEYCARD statistics. Because these statistics are used in cardinality estimates, they are for the entire index and not for an index partition. Distribution statistics (frequent values and quantiles) are not collected for partitioned indexes, but are gathered if RUNSTATS is run on the table. Statistics on the leading columns of a partitioned index might not be as accurate as statistics on the leading columns of a nonpartitioned index.

Collection of frequent values and distribution statistics on column groups is currently not supported.
It is recommended to run the RUNSTATS command:
- On tables that have been modified considerably (for example, if a large number of updates have been made, or if a significant amount of data has been inserted or deleted or if LOAD has been done without the statistics option during LOAD).
- On tables that have been reorganized (using REORG, REDISTRIBUTE DATABASE PARTITION GROUP).
- On tables which have been row compressed.
- When a new index has been created.
- Before binding applications whose performance is critical.
- When the prefetch quantity is changed.
- On statistical views whose underlying tables have been modified substantially so as to change the rows that are returned by the view.
- After LOAD has been executed with the STATISTICS option, use the RUNSTATS utility to collect statistics on XML columns. Statistics for XML columns are never collected during LOAD, even when LOAD is executed with the STATISTICS option. When RUNSTATS is used to collect statistics for XML columns only, existing statistics for non-XML columns that have been collected by LOAD or a previous execution of the RUNSTATS utility are retained. In the case where statistics on some XML columns have been collected previously, the previously collected statistics for an XML column will either be dropped if no statistics on that XML column are collected by the current command, or be replaced if statistics on that XML column are collected by the current command.
The options chosen must depend on the specific table and the application. In general:
- If the table is a very critical table in critical queries, is relatively small, or does not change too much and there is not too much activity on the system itself, it might be worth spending the effort on collecting statistics in as much detail as possible.
- If the time to collect statistics is limited, if the table is relatively large, or if the table is updated frequently, it might be beneficial to execute RUNSTATS limited to the set of columns that are used in predicates. This way, you will be able to execute the RUNSTATS command more often.
- If time to collect statistics is very limited and the effort to tailor the RUNSTATS command on a table by table basis is a major issue, consider collecting statistics for the "KEY" columns only. It is assumed that the index contains the set of columns that are critical to the table and are most likely to appear in predicates.
- If time to collect statistics is very limited and table statistics are to be gathered, consider using the TABLESAMPLE option to collect statistics on a subset of the table data.
- If there are many indexes on the table and DETAILED (extended) information on the indexes might improve access plans, consider the SAMPLED option to reduce the time it takes to collect statistics.
- If there is skew in certain columns and predicates of the type "column = constant", it might be beneficial to specify a larger NUM_FREQVALUES value for that column
- Collect distribution statistics for all columns that are used in equality predicates and for which the distribution of values might be skewed.
- For columns that have range predicates (for example "column >= constant", "column BETWEEN constant1 AND constant2") or of the type "column LIKE '%xyz'", it might be beneficial to specify a larger NUM_QUANTILES value.
- If storage space is a concern and one cannot afford too much time on collecting statistics, do not specify high NUM_FREQVALUES or NUM_QUANTILES values for columns that are not used in predicates.
- If index statistics are requested, and statistics have never been run on the table containing the index, statistics on both the table and indexes are calculated.
- If statistics for XML columns in the table are not required, the EXCLUDING XML COLUMNS option can be used to exclude all XML columns. This option takes precedence over all other clauses that specify XML columns for statistics collection.
After the command is run, note the following:
- A COMMIT should be issued to release the locks.
- To allow new access plans to be generated, the packages that reference the target table must be rebound.
- Executing the command on portions of the table could result in inconsistencies as a result of activity on the table since the command was last issued. In this case a warning message is returned. Issuing RUNSTATS on the table only might make table and index level statistics inconsistent. For example, you might collect index level statistics on a table and later delete a significant number of rows from the table. If you then issue RUNSTATS on the table only, the table cardinality might be less than FIRSTKEYCARD, which is an inconsistency. In the same way, if you collect statistics on a new index when you create it, the table level statistics might be inconsistent.
The RUNSTATS command drops previously collected distribution statistics if table statistics are requested. For example, RUNSTATS ON TABLE, or RUNSTATS ON TABLE ... AND INDEXES ALL will cause previously collected distribution statistics to be dropped. If the command is run on indexes only then previously collected distribution statistics are retained. For example, RUNSTATS ON TABLE ... FOR INDEXES ALL causes the previously collected distribution statistics to be retained. If the RUNSTATS command is run on XML columns only, then previously collected basic column statistics and distribution statistics are retained. In the case where statistics on some XML columns have been collected previously, the previously collected statistics for an XML column are either be dropped if no statistics on that XML column are collected by the current command, or be replaced if statistics on that XML column are collected by the current command.
For DB2 V9.7 Fix Pack 1 and later releases, distribution statistics are collected on indexes over XML data defined on an XML column. When the RUNSTATS command is run on a table with the WITH DISTRIBUTION clause, the following apply to the collection of distribution statistics on a column of type XML:
- Distribution statistics are collected for each index over XML data specified on an XML column.
- The RUNSTATS command must collect both distribution statistics and table statistics to collect distribution statistics for indexes over XML data defined on an XML column. Table statistics must be gathered in order for distribution statistics to be collected since XML distribution statistics are stored with table statistics.
  An index clause is not required to collect XML distribution statistics. Specifying only an index clause does not collect XML distribution statistics
  
  By default, XML distribution statistics use a maximum of 250 quantiles for each index over XML data. When collecting distribution statistics on an XML column, you can change the maximum number of quantiles by specifying a value with NUM_QUANTILES parameter in the ON COLUMNS or the DEFAULT clause.
- Distribution statistics are collected for indexes over XML data of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics are not collected over indexes of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes over XML data defined on a partitioned table.
For range-clustered tables, there is a special system-generated index in the catalog tables which represents the range ordering property of range-clustered tables. When statistics are collected on this type of table, if the table is to be included as part of the statistics collection, statistics will also be collected for the system-generated index. The statistics reflect the fast access of the range lookups by representing the index as a two-level index with as many pages as the base data table, and having the base data clustered perfectly along the index order.
In the On Dist Cols clause of the command syntax, the Frequency Option and Quantile Option parameters are currently not supported for column GROUPS. These options are supported for single columns.
There are three prefetch statistics that cannot be computed when working in DMS mode. When looking at the index statistics in the index catalogs, you will see a -1 value for the following statistics:
- AVERAGE_SEQUENCE_FETCH_PAGES
- AVERAGE_SEQUENCE_FETCH_GAP
- AVERAGE_RANDOM_FETCH_PAGES
RUNSTATS sampling through TABLESAMPLE only occurs with table data pages and not index pages. When index statistics as well as sampling is requested, all the index pages are scanned for statistics collection. It is only in the collection of table statistics where TABLESAMPLE is applicable. However, a more efficient collection of detailed index statistics is available through the SAMPLED DETAILED option. This is a different method of sampling than that employed by TABLESAMPLE and only applies to the detailed set of index statistics.
A statistics profile can be set or updated for the table or statistical view specified in the RUNSTATS command, by using the set profile or update profile options. The statistics profile is stored in a visible string format, which represents the RUNSTATS command, in the STATISTICS_PROFILE column of the SYSCAT.TABLES system catalog table.
Statistics collection on XML type columns is governed by two DB2 database system registry values: DB2_XML_RUNSTATS_PATHID_K and DB2_XML_RUNSTATS_PATHVALUE_K. These two parameters are similar to the NUM_FREQVALUES parameter in that they specify the number of frequency values to collect. If not set, a default of 200 will be used for both parameters.
RUNSTATS acquires an IX table lock on SYSTABLES and a U lock on the row for the table on which statistics are being gathered at the beginning of RUNSTATS. Operations can still read from SYSTABLES including the row with the U lock. Write operations are also possible, providing they do not occur against the row with the U lock. However, another reader or writer will not be able acquire an S lock on SYSTABLES because of RUNSTATS' IX lock.
Statistics are not collected for columns with structured types. If they are specified, columns with these data types are ignored.
Only AVGCOLLEN and NUMNULLS are collected for columns with LOB or LONG data types.
AVGCOLLEN represents the average space in bytes when the column is stored in database memory or a temporary table. This value represents the length of the data descriptor for LOB or LONG data types, except when LOB data is inlined on the data page.
Note: The average space required to store the column on disk may be different than the value represented by this statistic.
If you have modified the table since statistics were last collected on the table or its indexes, you should run RUNSTATS ON TABLE ... AND INDEXES ALL. If you use RUNSTATS ON TABLE ... FOR INDEXES ALL, the resulting statistics might be inconsistent.