RUNSTATS command

The RUNSTATS command updates statistics in the system catalog about the characteristics of a table, associated indexes, or statistical views. These characteristics include number of records, number of pages, and average record length. The optimizer uses these statistics to determine access paths to the data.

For a table, call the RUNSTATS command when the table had many updates, or after the table is reorganized. For a statistical view, call the RUNSTATS command when changes to underlying tables substantially affected the rows that are returned by the view. The view must be previously enabled for use in query optimization by using the ALTER VIEW statement.

Scope

You can issue the RUNSTATS command from any database partition in the db2nodes.cfg file. You can use the command to update the catalogs on the catalog database partition.

For tables, this command collects statistics for a table on the database partition from which it is started. If the table does not exist on that database partition, the first database partition in the database partition group is selected.

For views, this command collects statistics by using data from tables on all participating database partitions.

Authorization

For a table, you require one of the following authorities:

SYSADM
SYSCTRL
SYSMAINT
DBADM
LOAD
SQLADM
SCHEMAADM on the schema of the table
CONTROL privilege on the table

You do not need any explicit privilege to use this command on any declared temporary table that exists within its connection.

For statistical views, one of the following authorities is required:

SYSADM
SYSCTRL
SYSMAINT
DBADM
LOAD
SQLADM
SCHEMAADM on the schema of the view
CONTROL privilege on the statistical view

Required connection

Database

Command parameters

object-name

Identifies the table or statistical view on which statistics are to be collected. This parameter must not be a table hierarchy. For typed tables, the value of the object-name parameter must be the name of the root table of the table hierarchy. The fully qualified name or alias in the form: schema.object-name must be used. The schema is the username under which the table was created.

USE PROFILE

This option allows RUNSTATS to employ a previously stored statistics profile to gather statistics for a table or statistical view. The statistics profile is created by using the SET PROFILE options and is updated by using the UPDATE PROFILE options.

UNSET PROFILE

Removes a statistics profile. For example, the following command removes the profile for the tablemyschema.mytable table:

RUNSTATS ON tablemyschema.mytable UNSET PROFILE

FOR INDEXES

Collects and updates statistics for the indexes only. If no table statistics was previously collected on the table, basic table statistics are also collected. Updates statistics for table cardinality (CARD), FPAGES, NPAGES, even when table statistics exists. These basic statistics do not include any distribution statistics. This option cannot be used for views. COLCARD of the leading column of the index might also be updated.

SAMPLED

Used together only with the DETAILED parameter. Specifying this option does not change the default functionality from DETAILED. The SAMPLED parameter is left in for compatibility with previous versions of Db2®. This parameter cannot be used for views.

UNSAMPLED

This option, when used with the DETAILED option, forces RUNSTATS to examine every entry in the index to compute the extended index statistics. You cannot use the UNSAMPLED parameter for views and it cannot be used together with scan index sampling (INDEXSAMPLE keyword). This option significantly increases RUNSTATS resource consumption, while rarely providing significant improvement over the DETAILED or SAMPLED DETAILED options, which are equivalent.

DETAILED

Calculates extended index statistics. The extended index statistics are the CLUSTERFACTOR and PAGE_FETCH_PAIRS statistics that are gathered for relatively large indexes. Not all index entries are examined. A CPU sampling technique is employed instead to improve performance. You cannot use this parameter for views.

index-name

Identifies an existing index that is defined on the table. If you do not specify the fully qualified name in the form: schema.index-name, the default schema is assumed.

EXCLUDING XML COLUMNS

Omits all XML-type columns from statistics collection. Using this clause facilitates the collection of statistics on non-XML columns because the inclusion of XML data can require greater system resources. The EXCLUDING XML COLUMNS clause takes precedence over other clauses that specify XML columns for statistics collection. For example, if you use the EXCLUDING XML COLUMNS clause, and you also specify XML type columns with the ON COLUMNS clause or you use the ON ALL COLUMNS clause, all XML-type columns are ignored during statistics collection. For Db2 V9.7 Fix Pack 1 and later releases, distribution statistics over XML-type columns are not collected when you specify this parameter.

AND INDEXES

Collects and updates statistics for both the table and the indexes.

ON ALL COLUMNS

Collects statistics on all eligible columns. You can use this parameter with the On Cols clause or use this parameter with the On Dist Cols clause after the WITH DISTRIBUTION parameter. The ON ALL COLUMNS parameter is the default if you do not specify either of the column-specific clauses.

If it is specified in the On Cols clause, all columns will have only basic column statistics that are collected unless specific columns are chosen as part of the WITH DISTRIBUTION clause. Those columns that are specified as part of the WITH DISTRIBUTION clause also have basic and distribution statistics collected.

If the WITH DISTRIBUTION ON ALL COLUMNS is specified, both basic statistics and distribution statistics are collected for all eligible columns. Therefore, anything that is specified in the clause is redundant and not On Cols necessary.

ON COLUMNS

To collect statistics on specific columns, column groups, or both, use the ON COLUMNS. A column group is a parenthesized comma-separated list of columns for which you want to collect combined statistics.

The column and column groups are specified as a parenthesized comma-separated list.

Running the RUNSTATS command on a table without gathering index statistics but specifying a subset of columns for which statistics are to be gathered has the following effects:

Statistics are not reset for columns that you do not specify for the RUNSTATS command but are the first column in an index.
Statistics are reset for all other columns that you do not specify for the RUNSTATS command.

You can use the ON COLUMNS parameter in the On Cols clause and the On Dist Cols clause. Collecting distribution statistics for a group of columns is not currently supported.

If you specify XML-type columns in a column group, the XML-type columns are ignored for collecting distinct values for the group. However, basic XML column statistics are collected for the XML-type columns in the column group.

ON KEY COLUMNS

Collects statistics on columns that make up all the indexes that are defined on the table. It is assumed that critical columns in queries are also columns that are used to create indexes on the table. If no indexes on the table exist, no column statistics are collected.

You can use the ON KEY COLUMNS parameter in the On Cols clause or the On Dist Cols clause. Specifying the parameter in both clauses is redundant because if you specify the On Dist Cols clause (after the WITH DISTRIBUTION parameter), both basic and distribution statistics are collected.

XML-type columns are by definition not key columns and are not included for statistics collection by the ON KEY COLUMNS parameter. You cannot use this parameter for views.

column-name

Name of a column in the table or statistical view. If you specify the name of an ineligible column for statistics collection, such as a nonexistent column or a mistyped column name, error (-205) is returned. Two lists of columns can be specified, one without distribution and one with distribution. If the column is specified in the list that is not associated with the WITH DISTRIBUTION clause, only basic column statistics are collected. If the column appears in both lists, distribution statistics are collected (unless NUM_FREQVALUES and NUM_QUANTILES are set to zero).

LIKE STATISTICS

Collects additional column statistics for columns of type BINARY, VARBINARY, CHAR, or VARCHAR with a code page attribute of single-byte character set (SBCS), FOR BIT DATA, or UTF-8. The statistics are collected if the runstats utility determines that such statistics are appropriate after it analyzes column values. These statistics are shown in the SUB_COUNT and the SUB_DELIM_LENGTH columns in the SYSSTAT.COLUMNS views. The query optimizer uses these statistics to improve the selectivity estimates for predicates of the type "column LIKE '%xyz'" and "column LIKE '%xyz%'".

WITH DISTRIBUTION

Collects both basic statistics and distribution statistics on columns. If you do not specify the ON COLUMNS parameter, distribution statistics are collected on all the columns of the table or statistical view. However, distribution statistics are not be collected on columns that are ineligible such as columns of type CLOB and LONG VARCHAR. If you specify the ON COLUMNS parameter, distribution statistics are collected only on the column list that you provide, excluding those columns that are ineligible for statistics collection.

If you specify the WITH DISTRIBUTION parameter followed by the ON COLUMNS parameter with column groups, distribution statistics are not collected for the column groups.

DEFAULT

If you specify the NUM_FREQVALUES and NUM_QUANTILES parameters, the values of the parameters are used to determine the maximum number of frequency and quantile statistics to be collected for the columns. The NUM_FREQVALUES and NUM_QUANTILES parameters are used if you do not specify values for individual columns in the ON COLUMNS clause. If you do not specify the DEFAULT parameter, the values that are used are the values in the corresponding database configuration parameters.

NUM_FREQVALUES

Defines the DEFAULT parameter referred to frequency statistics. You can specify this parameter for an individual column after the ON COLUMNS parameter. If you instead specify the NUM_FREQVALUES parameter after the DEFAULT parameter, the value for the NUM_FREQVALUES parameter after the DEFAULT parameter is used. If the specified value is '-1' or you do not specify the NUM_FREQVALUES parameter after either the ON COLUMNS or DEFAULT parameter, the maximum number of frequency values is the value of the num_freqvalues database configuration parameter.

NUM_QUANTILES

Defines the maximum number of distribution quantile values to collect. It can be specified for an individual column in the ON COLUMNS clause. If the value is either not specified or is specified as '-1' for an individual column, the quantile limit value is picked up from the value that is specified in the DEFAULT clause. If it is not specified there either, the maximum number of quantile values to be collected are what is set in the num_quantiles database configuration parameter.

For Db2 V9.7 Fix Pack 1 and later releases, distribution statistics for each index over XML data uses a maximum of 250 quantiles as the default. The default can be changed by specifying the NUM_QUANTILES parameter in the ON COLUMNS or the DEFAULT clause. The num_quantiles database configuration parameter is ignored while it collects XML distribution statistics.

ALLOW WRITE ACCESS

Specifies that other users can read from and write to the tables while statistics are calculated. For statistical views, these tables are the base tables that are referenced in the view definition.

The ALLOW WRITE ACCESS option is not recommended for tables that have numerous inserts, updates, or deletes occurring concurrently. The RUNSTATS command first collects table statistics and then performs index statistics. Changes in the table's state between the time that the table and index statistics are collected might result in inconsistencies. Although having up-to-date statistics is important for the optimization of queries, it is also important to have consistent statistics. Therefore, statistics must be collected at a time when inserts, updates, or deletes are at a minimum.

ALLOW READ ACCESS

Specifies that other users can have read-only access to the tables while statistics are calculated. For statistical views, these tables are the base tables that are referenced in the view definition.

TABLESAMPLE BERNOULLI

Collects statistics on a sample of the rows from the table or statistical view. Bernoulli sampling considers each row individually, including the row with probability P/100 (where P is the value of the numeric-literal parameter) and excluding it with probability 1-P/100. Thus, if the value of the numeric-literal parameter is evaluated to be the value 10, representing a 10% sample, each row would be included with probability 0.1 and be excluded with probability 0.9. Unless you specify the optional REPEATABLE parameter, each execution of the RUNSTATS command usually yields a different sample of the table. All data pages are retrieved through a table scan, but only the percentage of rows that you specify by using the numeric-literal parameter is used for statistics collection.

TABLESAMPLE SYSTEM

Collects statistics on a sample of the data pages from the tables. System sampling considers each page individually, including the page with probability P/100 (where P is the value of the numeric-literal parameter) and excluding it with probability 1-P/100. Unless you specify the optional REPEATABLE parameter, each execution of the RUNSTATS command usually yields a different sample of the table. You control the size of the sample by specifying the numeric-literal parameter in parentheses, which represents an approximate percentage P of the table to return. Only a percentage of the data pages, as specified by the numeric-literal parameter, is retrieved and used for statistics collection.

For a statistical view, you can apply system sampling to only a single base table that is referenced in the view definition. If the view contains multiple tables, system sampling is possible if a single table in the statistical view can be identified as being joined with all primary keys or unique index columns of the other tables that are used in the view. If the statistical view does not meet those conditions, Bernoulli sampling is to be used instead, and a warning is returned.

numeric-literal

Specifies the size of the sample to be obtained, as a percentage P. This value must be a positive number that is less than or equal to 100 and can be from 0 - 1. For example, a value of 0.01 represents one 1/100 of a percent, such that one row in 10,000 is sampled, on average. A value of 0 or 100 is treated as if you did not specify sampling, regardless of whether you specified the TABLESAMPLE BERNOULLI or TABLESAMPLE SYSTEM parameter. A value greater than 100 or less than 0 is treated as an error (SQL1197N).

REPEATABLE (integer-literal)

When specified after the TABLESAMPLE parameter, ensures that repeated executions of the RUNSTATS command return the same sample. The integer-literal parameter specifies a non-negative integer that represents the seed to be used in sampling. Passing a negative seed results in an error (SQL1197N). The sample set might vary between repeatable RUNSTATS command invocations if activity against the table or statistical view that is resulted in changes to the table or statistical view data since the last time that you ran the command with the TABLESAMPLE REPEATABLE parameter. Also, to ensure consistent results, the method by which you obtained the sample (by using the BERNOULLI or SYSTEM parameter) must be the same.

INDEXSAMPLE BERNOULLI

Collects index statistics on a sample of the rows in the index. Bernoulli sampling considers each row individually, including the row with probability P/100 (where P is the value of the numeric-literal parameter) and excluding it with probability 1-P/100. Thus, if the numeric-literal parameter has the value 10, representing 10%, each row would be included with probability 0.1 and be excluded with probability 0.9. Each execution of the RUNSTATS command is likely to yield a different sample of the index. All index pages are retrieved through an index scan, but only the percentage of rows as specified through the numeric-literal parameter is used for statistics collection. The INDEXSAMPLE BERNOULLI parameter is not supported for statistical views.

INDEXSAMPLE SYSTEM

Collects statistics on a sample of the index pages. System sampling considers each page individually, including the page with probability P/100 (where P is the value of the numeric-literal parameter) and excluding it with probability 1-P/100. Each execution of the RUNSTATS command usually yields a different sample of the index. You control the size of the sample by specifying the numeric-literal parameter in parentheses, which represents an approximate percentage P of the index to return. Only a percentage of the index pages, as specified by the numeric-literal parameter, is retrieved and used for statistics collection. The INDEXSAMPLE SYSTEM parameter is not supported for statistical views.

SET PROFILE NONE

Specifies that no statistics profile is set for this RUNSTATS invocation.

SET PROFILE

Generates and stores a specific statistics profile in the system catalog tables and executes the RUNSTATS command options to gather statistics.

SET PROFILE ONLY

Generates and stores a specific statistics profile in the system catalog tables without running the RUNSTATS command options.

UPDATE PROFILE

Modifies a statistics profile in the system catalog tables and runs the RUNSTATS command options of the updated statistics profile to gather statistics. You cannot use the UPDATE PROFILE parameter to remove clauses that are in a statistics profile.

UPDATE PROFILE ONLY

Modifies a statistics profile in the system catalog tables without running the RUNSTATS command options of the updated statistics profile. You cannot use the UPDATE PROFILE ONLY parameter to remove clauses that are in a statistics profile.

UTIL_IMPACT_PRIORITY priority

Specifies that RUNSTATS is going to be throttled at the level that is specified by priority. priority is a number in the range of 1 to 100, with 100 representing the highest priority and 1 representing the lowest. The priority specifies the amount of throttling to which the utility is subjected. All utilities at the same priority undergo the same amount of throttling, and utilities at lower priorities are throttled more than utilities at higher priorities. If priority is not specified, the RUNSTATS has the default priority of 50. Omitting the UTIL_IMPACT_PRIORITY keyword invokes the RUNSTATS utility without throttling support. If the UTIL_IMPACT_PRIORITY keyword is specified, but the util_impact_lim configuration parameter is set to 100, then the utility runs unthrottled.

When you use the RUNSTATS command on tables in a partitioned database, statistics are collected on only a single database partition. If the database partition from which you ran the RUNSTATS command has a partition of the table, the command runs on that database partition. Otherwise, the command runs on the first database partition in the database partition group across which the table is partitioned.

Examples

The following command collects basic statistics (no distribution statistics) on the table only, on all columns:
```
RUNSTATS ON TABLE employee
```
The following command collects statistics on the table only, with distribution statistics on columns empid and empname:
```
RUNSTATS ON TABLE employee 
    WITH DISTRIBUTION ON COLUMNS (empid, empname) 
```
The following command collects basic and distribution statistics on the table only, on all columns. The command specifies a maximum number of frequency values and uses the maximum number of distribution quantile values from the configuration setting.
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT 
    NUM_FREQVALUES 50 
```
The following command collects basic and distribution statistics on the table only, on all columns and on column group JOB, WORKDEPT, and SEX:
```
RUNSTATS ON TABLE employee ON ALL COLUMNS AND COLUMNS ((JOB, WORKDEPT, SEX)) 
    WITH DISTRIBUTION
```
The following command collects statistics on a set of indexes:
```
RUNSTATS ON TABLE employee for indexes empl1, empl2
```
The following command collects statistics on all indexes only:
```
RUNSTATS ON TABLE employee FOR INDEXES ALL
```
The following command collects basic statistics on the table and all indexes, by using sampling for the detailed index statistics collection:
```
RUNSTATS ON TABLE employee AND SAMPLED DETAILED INDEXES ALL
```
This command is equivalent to the following one:
```
RUNSTATS ON TABLE employee AND DETAILED INDEXES ALL
```
The following command collects basic statistics on the table, with distribution statistics on columns empid, empname, and empdept and the two indexes Xempid and Xempname. Distribution statistics limits are set individually for empdept, and the other two columns use a common default.
```
RUNSTATS ON TABLE employee 
  WITH DISTRIBUTION ON COLUMNS (empid, empname, empdept NUM_FREQVALUES 
          50 NUM_QUANTILES 100)  
          DEFAULT NUM_FREQVALUES 5 NUM_QUANTILES 10
          AND INDEXES Xempid, Xempname
```
The following command collects statistics on all columns that are used in indexes and on all indexes:
```
RUNSTATS ON TABLE employee ON KEY COLUMNS AND INDEXES ALL
```

The following command collects statistics on all indexes and all columns with distribution and distribution statistics set individually for one of the columns. In this example, T1 contains columns c1, c2,.... , c8

RUNSTATS ON TABLE T1 
  WITH DISTRIBUTION ON COLUMNS (c1, c2, c3 NUM_FREQVALUES 20 
  NUM_QUANTILES 40, c4, c5, c6, c7, c8)
  DEFAULT NUM_FREQVALUES 0, NUM_QUANTILES 0 AND INDEXES ALL 
                                                 
RUNSTATS ON TABLE T1 
  WITH DISTRIBUTION ON COLUMNS (c3 NUM_FREQVALUES 20 NUM_QUANTILES 40)
  AND INDEXES ALL

The following command collects statistics on table T1 for the individual columns c1 and c5 and for the column groups (c2, c3) and (c2, c4). Multicolumn cardinality is useful to the query optimizer when it estimates filter factors for predicates on columns in which the data is correlated.
```
RUNSTATS ON TABLE T1 ON COLUMNS (c1, (c2, c3),
    (c2, c4), c5)
```
The following command collects statistics on table T1 for the individual columns c1 and c2. For column c1, the command also collects LIKE predicate statistics.
```
RUNSTATS ON TABLE T1 ON COLUMNS (c1 LIKE STATISTICS, c2)
```
The following command registers a statistics profile to collect basic and distribution statistics on the table only, on all columns. The command specifies a maximum number of frequency values and uses the maximum number of distribution quantile values from the configuration setting.
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION DEFAULT 
   NUM_FREQVALUES 50 SET PROFILE
```
The following command registers a statistics profile to collect basic and distribution statistics on the table only, on all columns. The command specifies a maximum number of frequency values and uses the maximum number of distribution quantile values from the configuration setting. Statistics are not collected.
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 50 SET PROFILE ONLY
```
The following command modifies the previously registered statistics profile by changing the maximum number of frequency values in the range 50 - 30. The command also updates the statistics as specified.
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE
```
The following command modifies the previously registered statistics profile by changing the maximum number of frequency values in the range 50 - 30. Statistics are not collected.
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION 
    DEFAULT NUM_FREQVALUES 30 UPDATE PROFILE ONLY
```
The following command modifies the previously registered statistics profile by adding column empl_address and column group (empl_title, empl_salary) options. The command also updates the statistics as specified.
```
RUNSTATS ON TABLE employee 
 ON COLUMNS (empl_address, (empl_title, empl_salary))
 UPDATE PROFILE
```
The following command modifies the previously registered statistics profile by adding column empl_address and column group (empl_title, empl_salary) options. Statistics are not collected.
```
RUNSTATS ON TABLE employee 
 ON COLUMNS (empl_address, (empl_title, empl_salary))
 UPDATE PROFILE ONLY
```
The following command collects statistics on a table by using the options that are recorded in the statistics profile for that table:
```
RUNSTATS ON TABLE employee USE PROFILE
```
The following command queries the RUNSTATS command options corresponding to the previously registered statistics profile in the catalogs of the table:
```
SELECT STATISTICS_PROFILE FROM SYSCAT.TABLES WHERE TABNAME = 
    'EMPLOYEE'
```
The following command collects statistics, including distribution statistics, on 30% of the rows:
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION 
    TABLESAMPLE BERNOULLI(30)
```
To control the sample set on which statistics are collected and to be able to repeatedly use the same sample set, you can do so as follows:
```
RUNSTATS ON TABLE employee WITH DISTRIBUTION 
    TABLESAMPLE BERNOULLI(30) REPEATABLE(4196)
```
The command results in the same set of statistics if the data is not changed in the interim.
The following command collects index statistics and table statistics on 1.5% of the data pages. Only table data pages, not index pages, are sampled. In this example, 1.5% of table data pages are used for the collection of table statistics, and all index pages are used for index statistics.
```
RUNSTATS ON TABLE employee AND INDEXES ALL TABLESAMPLE SYSTEM(1.5)
```
The following command collects table statistics on 1.5% of the data pages and index statistics on 2.5% of the index pages. Both table data pages and index pages are sampled.
```
RUNSTATS ON TABLE employee AND INDEXES ALL TAMBLESAMPLE SYSTEM(1.5)
    INDEXSAMPLE SYSTEM(2.5)
```
The following command collects statistics for a statistical view, on all columns, without distribution statistics:
```
RUNSTATS ON VIEW product_sales_view
```
The following command collects basic statistics for a statistical view, with distribution statistics on the category, type and product_key columns. Distribution statistics limits are set for the category column, and the other columns use a common default.
```
RUNSTATS ON VIEW product_sales_view
 WITH DISTRIBUTION ON COLUMNS (category NUM_FREQVALUES 100 NUM_QUANTILES 100,
		 type, product_key) DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
```
The following command collects basic and distribution statistics for a statistical view on 10% of the rows, by using row-level sampling:
```
RUNSTATS ON VIEW product_sales_view
 WITH DISTRIBUTION TABLESAMPLE BERNOULLI (10)
```
The following command collects basic and distribution statistics for a statistical view on 2.5% of the rows, by using data-page-level sampling. Additionally, this command specifies the repeated use of the same sample set. For this command to succeed, the query must be such that the Db2 database system can successfully push data-page sampling down to one or more tables. Otherwise, an error (SQL 20288N) is issued.
```
RUNSTATS ON VIEW product_sales_view
 WITH DISTRIBUTION TABLESAMPLE SYSTEM (2.5) REPEATABLE(4196)
```
The following command registers a statistics profile to collect statistics on the view and on all columns with distribution statistics as specified:
```
RUNSTATS ON VIEW product_sales_view
 WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50
 SET PROFILE
```
The following command modifies the previously registered statistics profile. This command also updates the statistics as specified:
```
RUNSTATS ON VIEW salesdb.product_sales_view
 WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 25 NUM_QUANTILES 25
 UPDATE PROFILE
```

Usage notes

You should run the RUNSTATS command in the following cases:
- On tables that were modified considerably: For example, if many updates were made, if a significant amount of data was inserted or deleted, or if you ran the LOAD command without the statistics option during LOAD.
- On tables that were reorganized by using the REORG or REDISTRIBUTE DATABASE PARTITION GROUP command.
- On tables that were row compressed.
- After you create a new index.
- Before binding applications whose performance is critical.
- When the prefetch quantity is changed.
- On statistical views whose underlying tables that were modified substantially to change the rows that are returned by the view.
- After you run the LOAD command with the STATISTICS option, if XML columns exist. Use the RUNSTATS utility to collect statistics on XML columns because the LOAD command does not collect statistics on these columns, even if you use the STATISTICS option. When you use the RUNSTATS command to collect statistics for XML columns only, statistics for non-XML columns that were collected by the LOAD command or a previous execution of the RUNSTATS utility are retained. If you previously collected statistics on an XML column, those statistics are either dropped if the current command does not collect statistics on that column or are replaced if the current command does collect statistics on that column.
The options that you choose depend on the specific table and the application. In general, the following guidelines apply:
- If the table is a critical table in critical queries, is relatively small, or does not change too much and there is not too much activity on the system itself, it might be worth spending the effort on collecting statistics in as much detail as possible.
- If the time to collect statistics is limited, if the table is relatively large, or if the table is updated frequently, it might be beneficial to run the RUNSTATS command on just the set of columns that are used in predicates. This way, you can run the RUNSTATS command more often.
- If time to collect statistics is limited and the effort to tailor the RUNSTATS command on a table-by-table basis is a major issue, consider collecting statistics for the KEY columns only. It is assumed that the index contains the set of columns that are critical to the table and are most likely to appear in predicates.
- If time to collect statistics is limited and table statistics are to be gathered, consider using the TABLESAMPLE option to collect statistics on a subset of the table data.
- If time to collect statistics is limited and index statistics are to be gathered, consider using the INDEXSAMPLE option to collect statistics on a subset of the index data.
- If skew exists in certain columns and predicates of the type "column = constant", it might be beneficial to specify a larger NUM_FREQVALUES value for the columns.
- Collect distribution statistics for all columns that you use in equality predicates and for which the distribution of values might be skewed.
- For columns that have range predicates (for example "column >= constant" or "column BETWEEN constant1 AND constant2") or are of the type "column LIKE '%xyz'", it might be beneficial to specify a larger NUM_QUANTILES value.
- If storage space is a concern and you do not have much time to collect statistics, do not specify high NUM_FREQVALUES or NUM_QUANTILES values for columns that you do not use in predicates.
- If you need index statistics, and statistics were never collected on the table that contains the index, statistics on both the table and indexes are calculated.
- If you do not require statistics for XML columns in the table, you can use the EXCLUDING XML COLUMNS parameter to exclude all XML columns. This parameter takes precedence over all other parameters that specify XML columns for statistics collection.
After the command is run, note the following:
- To release the locks, you must issue a COMMIT.
- To allow new access plans to be generated, you must rebind the packages that reference the target table.
- Running the command on portions of the table can result in inconsistencies as a result of activity on the table since you last issued the command. In this case, a warning message is returned.
  If you issue the RUNSTATS command on the table only, you might make table-level and index-level statistics inconsistent. For example, you might collect index-level statistics on a table and later delete a significant number of rows from the table. If you then issue the RUNSTATS command on the table only, the table cardinality might be less than the value of FIRSTKEYCARD, which is an inconsistency. Similarly, if you collect statistics on a new index when you create it, the table-level statistics might be inconsistent.
The RUNSTATS command drops previously collected distribution statistics if you request table statistics. For example, the RUNSTATS ON TABLE and RUNSTATS ON TABLE … AND INDEXES ALL commands cause previously collected distribution statistics to be dropped. If you run the command on indexes only, previously collected distribution statistics are retained. For example, the RUNSTATS ON TABLE … FOR INDEXES ALL command causes the previously collected distribution statistics to be retained. If you run the RUNSTATS command on XML columns only, previously collected basic column statistics and distribution statistics are retained. If you previously collected statistics on an XML column, those statistics are either dropped if the current command does not collect statistics on that column or are replaced if the current command does collects statistics on that column.
When detached partitions exist on a partitioned table, index keys that still belong to detached data partitions that require cleanup are not counted as part of the keys in the statistics. These keys are not counted because they are invisible and no longer part of the table. They will eventually get removed from the index by asynchronous index cleanup. As a result, statistics that are collected before asynchronous index cleanup is run are misleading. If the RUNSTATS command is issued before asynchronous index cleanup completes, it might generate a false alarm for index reorganization or index cleanup based on the inaccurate statistics. After asynchronous index cleanup is run, all the index keys that still belong to detached data partitions that require cleanup is removed and this might eliminate the need for index reorganization.
For partitioned tables, you are encouraged to issue the RUNSTATS command after an asynchronous index cleanup is completed to generate accurate index statistics in the presence of detached data partitions. To determine whether detached data partitions exist in the table, you can check the status field in the SYSCAT.DATAPARTITIONS catalog view and look for the value L (logically detached), I (index cleanup), or D (detached with dependent MQT).

The RUNSTATS command collects statistics for all index partitions of a partitioned index. Statistics on the leading columns of a partitioned index might not be as accurate as statistics on the leading columns of a nonpartitioned index.
When the RUNSTATS command is run on a table with an expression-based index, and the RUNSTATS command includes that index in the AND INDEXES or FOR INDEXES clause, statistics for expression-based key columns within that index is also collected and will be associated with the system-generated statistical view associated with the index. Expressions cannot be specified as columns in the RUNSTATS command (SQL0205N). To collect customized statistics on expression-based index key columns, you can define a statistics profile on the statistical view that is associated with the index by using the column names as they appear in the statistics view. You also need to define a statistics profile on the base table that includes the index in its INDEXES clause.
Note: When you define a statistics profile on the expression that is based index's statistical view, an automatically generated statistics profile is also associated with the base table, if one does not exist already. After the statistics profiles are defined, a RUNSTATS command on the base table with the USE PROFILE clause results in the customized statistics being gathered on the expression-based key columns in the index.
Distribution statistics are collected on indexes over XML data that is defined on an XML column. If you run the RUNSTATS command on a table with the WITH DISTRIBUTION parameter, collection of distribution statistics on a column of type XML occurs as follows:
- The RUNSTATS command must collect both distribution statistics and table statistics to collect distribution statistics for indexes over XML data that is defined on an XML column. Table statistics must be gathered in order for distribution statistics to be collected since XML distribution statistics are stored with table statistics.
  An index clause is not required to collect XML distribution statistics. Specifying only an index clause does not collect XML distribution statistics.
  
  By default, XML distribution statistics use a maximum of 250 quantiles for each index over XML data. When you collect distribution statistics on an XML column, you can change the maximum number of quantiles by specifying a value for the NUM_QUANTILES parameter in the ON COLUMNS or the DEFAULT clause.
- Distribution statistics are collected for indexes over XML data of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics are not collected over indexes of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes over XML data that is defined on a partitioned table.
A special system-generated index in the catalog tables represents the range-ordering property of range-clustered tables. When you collect statistics on this type of table, statistics are also collected for the system-generated index. The statistics reflect the fast access of the range lookups by representing the index as a two-level index with as many pages as the base data table and by clustering the base data perfectly along the index order.
In the On Dist Cols clause of the command, the Frequency Option and Quantile Option parameters are not supported for column groups. These parameters are supported for single columns.
Three types of prefetch statistics cannot be computed in DMS mode. In the index statistics in the index catalogs, you see a -1 value for the following statistics:
- AVERAGE_SEQUENCE_FETCH_PAGES
- AVERAGE_SEQUENCE_FETCH_GAP
- AVERAGE_RANDOM_FETCH_PAGES
The statistics profile is stored in a visible string format, which represents the RUNSTATS command, in the STATISTICS_PROFILE column of the SYSCAT.TABLES system catalog table.
Statistics collection on XML-type columns is governed by two Db2 registry variables: DB2_XML_RUNSTATS_PATHID_K and DB2_XML_RUNSTATS_PATHVALUE_K. These two registry variables are similar to the NUM_FREQVALUES parameter in that they specify the number of frequency values to collect. If you do not set the parameters, a default of 200 is used.
When you start running the RUNSTATS command, it acquires an IX table lock on the SYSTABLES table and a U lock on the row of the table on which you are gathering statistics. Operations can still read from the SYSTABLES table, including the row in the table with the U lock. Write operations are also possible if they do not occur against the row with the U lock. However, another reader or writer cannot acquire an S lock on the SYSTABLES table because of the IX lock that the RUNSTATS command acquired.
Statistics are not collected for columns with structured types. If they are specified, columns with these data types are ignored.
Only AVGCOLLEN and NUMNULLS are collected for columns with LOB or LONG data types. AVGCOLLEN represents the average space in bytes when the column is stored in database memory or a temporary table. This value represents the length of the data descriptor for LOB or LONG data types, except when LOB data is inlined on the data page. The average space that is required to store the column on disk can be different than the value represented by this statistic.
The UNSAMPLED DETAILED option is available to change the way index statistics are collected, but it should be used only in cases where it is clear that the default or DETAILED doesn't work.
When you use the INDEXSAMPLE parameter, you cannot specify different index sampling rates for different indexes within a single command. For example, the following command is invalid:
```
runstats on table orders and index o_ck indexsample system(5), 
   index o_ok indexsample system(10)
```
You can use the following two RUNSTATS commands to achieve the needed result:
```
runstats on table orders and index o_ck indexsample system(5)
runstats on table orders for index o_ok indexsample system(10)
```
If you modified the table since statistics were last collected on the table or its indexes, you must run RUNSTATS ON TABLE … AND INDEXES ALL. If you use RUNSTATS ON TABLE … FOR INDEXES ALL, the resulting statistics might be inconsistent.