db2Runstats API - Update statistics for tables and indexes

Updates statistics about the characteristics of a table or any associated indexes or statistical views. This command can also update statistics about the characteristics of both a table and any associated indexes or statistical views. These characteristics include, among many others, number of records, number of pages, and average record length. The optimizer uses these statistics when determining access paths to the data.

When used on tables, this utility should be called when a table has had many updates, after reorganizing a table, or after creating a new index.

Statistics are based on the portion of the table that resides on the database partition where the API executes. Global table statistics are derived by multiplying the values obtained at a database partition by the number of database partitions on which the table is completely stored. The global statistics are stored in the catalog tables. The database partition from which the API is called does not have to contain a portion of the table:

If the API is called from a database partition that contains a portion of the table, the utility executes at this database partition.
If the API is called from a database partition that does not contain a portion of the table, the request is sent to the first database partition in the database partition group that contains a portion of the table. The utility then executes at this database partition. When you collect statistics for a statistical view, statistics are collected for all database partitions.

When used on statistical views, this utility should be called when changes to underlying tables have substantially affected the rows returned by a view. These views must have been enabled for use in query optimization using "ALTER VIEW ... ENABLE QUERY OPTIMIZATION."

Scope

This API can be called from any database partition server in the db2nodes.cfg file. It can be used to update the catalogs on the catalog database partition.

Authorization

When used on tables, one of the following authorities:

SYSADM
SYSCTRL
SYSMAINT
DBADM
SQLADM
CONTROL privilege on the table
LOAD

When used on statistical views, one of the following authorities:

SYSADM
SYSCTRL
SYSMAINT
DBADM
SQLADM
CONTROL privilege on the view

In addition, the user needs to have the appropriate authority or privilege to access rows from the view. Specifically, for each table, view or nickname referenced in the view definition, the user must have one of the following authorities or privileges:

DATAACCESS
CONTROL privilege
SELECT privilege

Required connection

Database

API include file

db2ApiDf.h

API and data structure syntax

SQL_API_RC SQL_API_FN
  db2Runstats (
                db2Uint32 versionNumber,
                void * data,
                struct sqlca * pSqlca);

typedef SQL_STRUCTURE db2RunstatsData
{
  double iSamplingOption;
  unsigned char *piTablename;
  struct db2ColumnData        **piColumnList;
  struct db2ColumnDistData    **piColumnDistributionList;
  struct db2ColumnGrpData     **piColumnGroupList;
  unsigned char               **piIndexList;
  db2Uint32 iRunstatsFlags;
  db2int16 iNumColumns;
  db2int16 iNumColdist;
  db2int16 iNumColGroups;
  db2int16 iNumIndexes;
  db2int16 iParallelismOption;
  db2int16 iTableDefaultFreqValues;
  db2int16 iTableDefaultQuantiles;
  db2Uint32 iSamplingRepeatable;
  db2Uint32 iUtilImpactPriority;
  double iIndexSamplingOption;
} db2RunstatsData;

typedef SQL_STRUCTURE db2ColumnData
{
  unsigned char *piColumnName;
  db2int16 iColumnFlags;
} db2ColumnData;

typedef SQL_STRUCTURE db2ColumnDistData
{
  unsigned char *piColumnName;
  db2int16 iNumFreqValues;
  db2int16 iNumQuantiles;
} db2ColumnDistData;

typedef SQL_STRUCTURE db2ColumnGrpData
{
  unsigned char               **piGroupColumnNames;
  db2int16 iGroupSize;
  db2int16 iNumFreqValues;
  db2int16 iNumQuantiles;
} db2ColumnGrpData;

SQL_API_RC SQL_API_FN
  db2gRunstats (
                db2Uint32 versionNumber,
                void * data,
                struct sqlca * pSqlca);

typedef SQL_STRUCTURE db2gRunstatsData
{
  double iSamplingOption;
  unsigned char *piTablename;
  struct db2gColumnData       **piColumnList;
  struct db2gColumnDistData   **piColumnDistributionList;
  struct db2gColumnGrpData    **piColumnGroupList;
  unsigned char               **piIndexList;
  db2Uint16 *piIndexNamesLen;
  db2Uint32 iRunstatsFlags;
  db2Uint16 iTablenameLen;
  db2int16 iNumColumns;
  db2int16 iNumColdist;
  db2int16 iNumColGroups;
  db2int16 iNumIndexes;
  db2int16 iParallelismOption;
  db2int16 iTableDefaultFreqValues;
  db2int16 iTableDefaultQuantiles;
  db2Uint32 iSamplingRepeatable;
  db2Uint32 iUtilImpactPriority;
} db2gRunstatsData;

typedef SQL_STRUCTURE db2gColumnData
{
  unsigned char *piColumnName;
  db2Uint16 iColumnNameLen;
  db2int16 iColumnFlags;
} db2gColumnData;

typedef SQL_STRUCTURE db2gColumnDistData
{
  unsigned char *piColumnName;
  db2Uint16 iColumnNameLen;
  db2int16 iNumFreqValues;
  db2int16 iNumQuantiles;
} db2gColumnDistData;

typedef SQL_STRUCTURE db2gColumnGrpData
{
  unsigned char               **piGroupColumnNames;
  db2Uint16 *piGroupColumnNamesLen;
  db2int16 iGroupSize;
  db2int16 iNumFreqValues;
  db2int16 iNumQuantiles;
} db2gColumnGrpData;

db2Runstats API parameters

versionNumber: Input. Specifies the version and release level of the structure passed as the second parameter data.
data: Input. A pointer to the db2RunstatsData structure.
pSqlca: Output. A pointer to the sqlca structure.

db2RunstatsData data structure parameters

iSamplingOption

Input. Indicates that statistics are to be collected on a sample of table or view data. iSamplingOption represents the size of the sample as a percentage P. This value must be a positive number that is less than or equal to 100, but may be between 1 and 0. For example, a value of 0.01 represents one one-hundredth of a percent, such that 1 row in 10 000 would be sampled, on average. A value of 0 or 100 will be treated by Db2® as if sampling was not specified, regardless of whether DB2RUNSTATS_SAMPLING_SYSTEM has been specified. A value greater than 100 or less than 0 will be treated by Db2 as an error (SQL1197N). The two possible types of sampling are BERNOULLI and SYSTEM. The sampling type specification is controlled by the indicated setting of DB2RUNSTATS_SAMPLING_SYSTEM in the iRunstatsFlags.

piTablename

Input. A pointer to the fully qualified name of the table or statistical view on which statistics are to be gathered. The name can be an alias. For row types, piTablename must be the name of the hierarchy's root table.

piColumnList

Input. An array of db2ColumnData elements. Each element of this array is made up of two sub-elements:

a string that represents the name of the column on which to collect statistics
a flags field indicating statistic options for the column

If iNumColumns is zero then piColumnList is ignored if provided.

piColumnDistributionList

Input. An array of db2ColumnDistData elements. These elements are provided when collecting distribution statistics on a particular column or columns is required. Each element of this array is made up of three sub-elements:

a string that represents the name of the column on which to collect distribution statistics
the number of frequent values to collect.
the number of quantiles to collect

Any columns which appear in the piColumnDistributionList that do NOT appear in the piColumnList, will have basic column statistics collected on them. This would be the same effect as having included these columns in the piColumnList in the first place. If iNumColdist is zero then piColumnDistributionList is ignored.

piColumnGroupList

Input. An array of db2ColumnGrpData elements. These elements are provided when collecting column statistics on a group of columns. That is, the values in each column of the group for each row will be concatenated together and treated as a single value. Each db2ColumnGrpData is made up of 3 integer fields and an array of strings. The first integer field represents the number of strings in the array of strings piGroupColumns. Each string in this array contains one column name. For example, if column combinations statistics are to be collected on column groups (c1,c2) and on (c3,c4,c5) then there are 2 db2ColumnGrpData elements in piGroupColumns.

The first db2ColumnGrpData element is as follows: piGroupSize = 2 and the array of strings contains 2 elements, namely, c1 and c2.

The second db2ColumnGrpData element is as follows: piGroupSize = 3 and the array of strings contains 3 elements, namely, c3, c4 and c5.

The second and the third integer fields represent the number of frequent values and the number of quantiles when collecting distribution statistics on column groups. This is not currently supported.

Any columns which appear in the piColumnGroupList that do NOT appear in the piColumnList, will have basic column statistics collected on them. This would be the same effect as having included these columns in the piColumnList in the first place. If iNumColGroups is zero then piColumnGroupList is ignored.

piIndexList

Input. An array of strings. Each string contains one fully qualified index name. If NumIndexes is zero then piIndexList is ignored.

iRunstatsFlags

Input. A bit mask field used to specify statistics options. Valid values (defined in db2ApiDf header file, located in the include directory) are:

DB2RUNSTATS_ALL_COLUMNS

Collect statistics on all columns of the table or statistical view. This option can be specified in combination with column, column distribution, column group or index structure lists. This is useful if you would like to collect statistics on all columns of the table or view but would like to provide statistics options for specific columns.

DB2RUNSTATS_KEY_COLUMNS

Collect statistics only on the columns that make up all the indexes defined on the table. This option cannot be used for statistical views. On tables, it can be specified in combination with column, column distribution, column group or index structure lists. This is useful if you would like to collect statistics on all key columns of the table but would also like to gather statistics for some non-key columns or would like to provide statistics options for specific key columns. XML type columns are, by definition, not key columns and will not be included for statistics collection when the iRunstatsFlags parameter is set to the value DB2RUNSTATS_KEY_COLUMNS.

DB2RUNSTATS_DISTRIBUTION

Collect distribution statistics. This option can only be used with DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS. When used with DB2RUNSTATS_ALL_COLUMNS, distribution statistics are gathered for all columns of the table or statistical view. When used with DB2RUNSTATS_KEY_COLUMNS, distribution statistics are gathered for all columns that make up all the indexes defined on the table. When used with both DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS, basic statistics are gathered for all columns of the table and distribution statistics are gathered for only columns that make up all the indexes defined on the table.

DB2RUNSTATS_ALL_INDEXES

Collect statistics on all indexes defined on the table. This option cannot be used for statistical views.

DB2RUNSTATS_EXT_INDEX

Collect detailed index statistics. The option must be specified with either DB2RUNSTATS_ALL_INDEXES or an explicit list of index names (piIndexList and iNumIndexes > 0). This option cannot be used for statistical views.

DB2RUNSTATS_EXT_INDEX_SAMPLED

Collect detailed index statistics using sampling methods. The option must be specified with either DB2RUNSTATS_ALL_INDEXES or an explicit list of index names (piIndexList and iNumIndexes > 0). DB2RUNSTATS_EXT_INDEX will be ignored if specified at the same time. This option cannot be used for statistical views.

DB2RUNSTATS_ALLOW_READ

Allows others to have read-only access while the statistics are being gathered. The default is to allow read and write access.

DB2RUNSTATS_SAMPLING_SYSTEM

Collect statistics on a percentage of the data pages as specified by the user via the iSamplingOption parameter. SYSTEM sampling considers each page individually, including that page with probability P/100 (where P is the value of iSamplingOption) and excluding it with probability 1-P/100. Thus, if iSamplingOption is the value 10, representing a 10 percent sample, each page would be included with probability 0.1 and be excluded with probability 0.9.

For statistical views, SYSTEM sampling can only be applied to a single base table referenced in the view definition. If the view contains multiple tables, SYSTEM sampling is possible if a single table among all the tables in the statistical view can be identified as being joined with all primary keys or unique index columns of the other tables used in the view. If the statistical view does not meet those conditions, BERNOULLI sampling will be used instead and a warning will be returned.

If DB2RUNSTATS_SAMPLING_SYSTEM is not specified, Db2 will assume that BERNOULLI sampling is to be used as the sampling method. BERNOULLI sampling considers each row individually, including that row with probability P/100 (where P is the value of iSamplingOption) and excluding it with probability 1-P/100.

In both SYSTEM and BERNOULLI sampling, unless the DB2RUNSTATS_SAMPLING_REPEAT flag is specified, each execution of statistics collection will usually yield a different sample of the table or statistical view.

DB2RUNSTATS_SAMPLING_REPEAT

Specifies that a seed has been passed through the iSamplingRepeatable parameter. The iSamplingRepeatable value will be used as the seed to generate the data sample. The iSamplingOption parameter must also be specified to indicate the sampling rate.

DB2RUNSTATS_USE_PROFILE

Collect statistics for a table or statistical view by using a statistics profile already registered in the catalogs of the table or view. If the USE PROFILE option is specified by this flag set in iRunstatsFlags bit mask, all other options in db2RunstatsData will be ignored.

DB2RUNSTATS_SET_PROFILE

Generate and store a profile in the catalogs recording the statistics options specified and collect statistics using those same options.

DB2RUNSTATS_SET_PROFILE_ONLY

Generate and store a profile in the catalogs recording the statistics options specified without actually collecting statistics for the table or view.

DB2RUNSTATS_UNSET_PROFILE

Unsetting a statistics profile will remove the statistics profile from the system catalogs by setting the SYSCAT.STATISTICS_PROFILE to NULL. If a statistics profile does not exist, attempting to unset it will result in an error (SQLCODE -2315).

DB2RUNSTATS_UPDATE_PROFILE

Modify an existing statistics profile in the catalogs and collect statistics using the options from the updated profile.

DB2RUNSTATS_UPDA_PROFILE_ONLY

Modify an existing statistics profile in the catalogs without actually collecting statistics for the table or view.

DB2RUNSTATS_EXCLUDING_XML

Do not collect statistics on XML type columns. Statistics will still be collected on all specified columns that have non-XML type. This option takes precedence over all other methods that specify XML columns.

DB2RUNSTATS_INDEX_SYSTEM

Collects the index statistics on a percentage of the index pages as specified by the user via the iIndexSamplingOption parameter. SYSTEM sampling considers each page individually, including the index pages with probability P/100 (where P is the value of iIndexSamplingOption) and excluding it with probability 1-P/100. Thus, if iIndexSamplingOption is the value 10, representing a 10 percent sample, each page would be included with probability 0.1 and be excluded with probability 0.9. If DB2RUNSTATS_INDEX_SYSTEM is not specified, Db2 will assume that BERNOULLI sampling is to be used as the sampling method. BERNOULLI sampling considers each row individually, including that row with probability P/100 (where P is the value of iIndexSamplingOption) and excluding it with probability 1-P/100.

iNumColumns

Input. The number of items specified in the piColumnList list.

iNumColdist

Input. The number of items specified in the piColumnDistributionList list.

iNumColGroups

Input. The number of items specified in the piColumnGroupList list.

iNumIndexes

Input. The number of items specified in the piIndexList list.

iParallelismOption

Input. Reserved for future use. Valid value is 0.

iTableDefaultFreqValues

Input. Specifies the default number of frequent values to collect for the table or view. Valid values are:

n: n frequent values will be collected unless otherwise specified at the column level.
0: No frequent values will be collected unless otherwise specified at the column level.
-1: Use the default database configuration parameter num_freqvalues for the number of frequent values to collect.

iTableDefaultQuantiles

Input. Specifies the default number of quantiles to collect for the table or view. Valid values are:

n: n quantiles will be collected unless otherwise specified at the column level.
0: No quantiles will be collected unless otherwise specified at the column level.
-1: Use the default database configuration parameter num_quantiles for the number of quantiles to collect.

iSamplingRepeatable

Input. A non-negative integer representing the seed to be used in table or view sampling. Passing a negative seed will result in an error (SQL1197N).

The DB2RUNSTATS_SAMPLING_REPEAT flag must be set to use this seed. This option is used in conjunction with the iSamplingOption parameter to generate the same sample of data in subsequent statistics collection. The sample set may still vary between repeatable requests if activity against the table or view resulted in changes to the table or view data since the last time a repeatable request was run. Also, the method by which the sample was obtained (BERNOULLI or SYSTEM) must also be the same to ensure consistent results.

iUtilImpactPriority

Input. Priority for the RUNSTATS invocation. Valid values must fall in the range 0-100, with 70 representing unthrottled and 100 representing the highest possible priority. This option cannot be used for statistical views.

iIndexSamplingOption

Input. Indicates that statistics are to be collected on a sample of the index data. iIndexSamplingOption represents the size of the sample as a percentage P. This value must be a positive number that is less than or equal to 100, but can be between 0 and 1. For example, a value of 0.01 represents one hundredth of one percent, such that one index page in 10 000 is sampled on average. A value of 0 or 100 is treated by Db2 as if sampling was not specified, regardless of whether DB2RUNSTATS_INDEX_SYSTEM has been specified. A value greater than 100 or less than 0 is treated by Db2 as an error (SQL1197N). The two possible types of sampling are BERNOULLI and SYSTEM. The sampling type specification is controlled by the indicated setting of DB2RUNSTATS_INDEX_SYSTEM in iRunstatsFlags. If the flag is not set, BERNOULLI sampling is assumed.

db2ColumnData data structure parameters

piColumnName

Input. Pointer to a string representing a column name.

iColumnFlags

Input. A bit mask field used to specify statistics options for the column. Valid values are:

DB2RUNSTATS_COLUMN_LIKE_STATS: Collect LIKE statistics on the column.

db2ColumnDistData data structure parameters

piColumnName

Input. Pointer to a string representing a column name.

iNumFreqValues

Input. The number of frequent values to collect on the column. Valid values are:

n: Collect n frequent values on the column.
-1: Use the table default number of frequent values, such as iTableDefaultFreqValues if set, or the database configuration parameter num_freqvalues.

iNumQuantiles

Input. The number of quantiles to collect on the column. Valid values are:

n: Collect n quantiles on the column.
-1: Use the table default number of quantiles, iTableDefaultQuantiles if set, or the database configuration parameter num_quantiles.

db2ColumnGrpData data structure parameters

piGroupColumnNames

Input. An array of strings. Each string represents a column name that is part of the column group on which to collect statistics.

iGroupSize

Input. Number of columns in the column group. Valid values are:

n: The column group is made up of n columns.

iNumFreqValues

Input. Reserved for future use.

iNumQuantiles

Input. Reserved for future use.

db2gRunstatsData data structure specific parameters

piIndexNamesLen: Input. An array of values representing the length in bytes of each of the index names in the index list. If NumIndexes is zero then piIndexNamesLen is ignored.
iTablenameLen: Input. A value representing the length in bytes of the table or view name.

db2gColumnData data structure specific parameters

iColumnNameLen: Input. A value representing the length in bytes of the column name.

db2gColumnDistData data structure specific parameters

iColumnNameLen: Input. A value representing the length in bytes of the column name.

db2gColumnGrpData data structure specific parameters

piGroupColumnNamesLen: Input. An array of values representing the length in bytes of each of the column names in the column names list.

Usage notes

Use db2Runstats to update statistics:

On tables that have been modified many times (for example, if a large number of updates have been made, or if a significant amount of data has been inserted or deleted)
On tables that have been reorganized
When a new index has been created.
On views whose underlying tables have been modified substantially so as to change the rows that are returned by the view.

After statistics have been updated, new access paths to the table can be created by rebinding the packages using sqlabndx - Bind.

If index statistics are requested, and statistics have never been run on the table containing the index, statistics on both the table and indexes are calculated.

If the db2Runstats API is collecting statistics on indexes only then previously collected distribution statistics are retained. Otherwise, the API will drop previously collected distribution statistics. If the db2Runstats API is collecting statistics on XML columns only, then previously collected basic column statistics and distribution statistics are retained. In the case where statistics on some XML columns have been collected previously, the previously collected statistics for an XML column will either be dropped if no statistics on that XML column are collected by the current call to the db2Runstats API, or be replaced if statistics on that XML column are collected by the current call to the db2Runstats API. Otherwise, the API will drop previously collected distribution statistics.

If the iRunstatsFlags parameter is set to the value DB2RUNSTATS_EXCLUDING_XML, statistics will not be collected on XML columns. This value takes precedence over all other methods that specify XML columns.

For Db2 V9.7 Fix Pack 1 and later releases, the following items apply for the collection of distribution statistics for a column of type XML:

Distribution statistics are collected for each index over XML data specified on an XML column.
To collect distribution statistics for an index over XML data, both distribution statistics and table statistics must be collected. Table statistics must be gathered in order for distribution statistics to be collected because XML distribution statistics are stored with table statistics.
Collecting index statistics is not required to collect XML distribution statistics. Collecting index statistics without collecting distribution statistics does not collect XML distribution statistics.

By default, XML distribution statistics use a maximum of 250 quantiles.
Distribution statistics are collected for indexes over XML data of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics are not collected for indexes over XML data of type VARCHAR HASHED.
Distribution statistics are not collected for partitioned indexes over XML data defined on a partitioned table.

After calling this API, the application should issue a COMMIT to release the locks.

To allow new access plans to be generated, the packages that reference the target table must be rebound after calling this API. Packages that contain queries that can take advantage of statistical views must also be rebound after updating statistics on such views.

When statistics are collected for statistical views, an SQL query is run internally. The EXPLAIN facility can be used to examine the access plan selected for this query to investigate any performance problems with the statistics collection. To save the query access plan in the EXPLAIN tables, set the CURRENT EXPLAIN MODE special register to YES.

Running this API on the table only may result in a situation where the table level statistics are inconsistent with the already existing index level statistics. For example, if index level statistics are collected on a particular table and later a significant number of rows is deleted from this table, issuing this API on the table only may end up with the table cardinality less than FIRSTKEYCARD (FIRSTKEYCARD is a catalog statistics field in SYSCAT.INDEXES and SYSSTAT.INDEXES catalog views) which is an inconsistent state. Likewise, issuing this API for indexes only may leave the already existing table level statistics in an inconsistent state. For example, if table level statistics are collected on a particular table and later a significant number of rows is deleted from this table, issuing the db2Runstats API for the indexes only may end up with some columns having a COLCARD (COLCARD is a catalog statistics field in SYSCAT.COLUMNS and SYSSTAT.COLUMNS catalog views) greater than the table cardinality. A warning will be returned if such an inconsistency is detected.

Statistics are not collected for columns with structured types. If they are specified, columns with these data types are ignored.

Only AVGCOLLEN and NUMNULLS are collected for columns with LOB or LONG data types.

AVGCOLLEN represents the average space in bytes when the column is stored in database memory or a temporary table. This value represents the length of the data descriptor for LOB or LONG data types, except when LOB data is inlined on the data page.

Note: The average space required to store the column on disk may be different than the value represented by this statistic.