db2Runstats API - Update statistics for tables and indexes
When used on tables, this utility should be called when a table has had many updates, after reorganizing a table, or after creating a new index.
- If the API is called from a database partition that contains a portion of the table, the utility executes at this database partition.
- If the API is called from a database partition that does not contain a portion of the table, the request is sent to the first database partition in the database partition group that contains a portion of the table. The utility then executes at this database partition. When you collect statistics for a statistical view, statistics are collected for all database partitions.
When used on statistical views, this utility should be called when changes to underlying tables have substantially affected the rows returned by a view. These views must have been enabled for use in query optimization using "ALTER VIEW ... ENABLE QUERY OPTIMIZATION."
Scope
This API can be called from any database partition server in the db2nodes.cfg file. It can be used to update the catalogs on the catalog database partition.
Authorization
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the table
- LOAD
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the view
- DATAACCESS
- CONTROL privilege
- SELECT privilege
Required connection
Database
API include file
db2ApiDf.h
API and data structure syntax
SQL_API_RC SQL_API_FN
db2Runstats (
db2Uint32 versionNumber,
void * data,
struct sqlca * pSqlca);
typedef SQL_STRUCTURE db2RunstatsData
{
double iSamplingOption;
unsigned char *piTablename;
struct db2ColumnData **piColumnList;
struct db2ColumnDistData **piColumnDistributionList;
struct db2ColumnGrpData **piColumnGroupList;
unsigned char **piIndexList;
db2Uint32 iRunstatsFlags;
db2int16 iNumColumns;
db2int16 iNumColdist;
db2int16 iNumColGroups;
db2int16 iNumIndexes;
db2int16 iParallelismOption;
db2int16 iTableDefaultFreqValues;
db2int16 iTableDefaultQuantiles;
db2Uint32 iSamplingRepeatable;
db2Uint32 iUtilImpactPriority;
double iIndexSamplingOption;
} db2RunstatsData;
typedef SQL_STRUCTURE db2ColumnData
{
unsigned char *piColumnName;
db2int16 iColumnFlags;
} db2ColumnData;
typedef SQL_STRUCTURE db2ColumnDistData
{
unsigned char *piColumnName;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2ColumnDistData;
typedef SQL_STRUCTURE db2ColumnGrpData
{
unsigned char **piGroupColumnNames;
db2int16 iGroupSize;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2ColumnGrpData;
SQL_API_RC SQL_API_FN
db2gRunstats (
db2Uint32 versionNumber,
void * data,
struct sqlca * pSqlca);
typedef SQL_STRUCTURE db2gRunstatsData
{
double iSamplingOption;
unsigned char *piTablename;
struct db2gColumnData **piColumnList;
struct db2gColumnDistData **piColumnDistributionList;
struct db2gColumnGrpData **piColumnGroupList;
unsigned char **piIndexList;
db2Uint16 *piIndexNamesLen;
db2Uint32 iRunstatsFlags;
db2Uint16 iTablenameLen;
db2int16 iNumColumns;
db2int16 iNumColdist;
db2int16 iNumColGroups;
db2int16 iNumIndexes;
db2int16 iParallelismOption;
db2int16 iTableDefaultFreqValues;
db2int16 iTableDefaultQuantiles;
db2Uint32 iSamplingRepeatable;
db2Uint32 iUtilImpactPriority;
} db2gRunstatsData;
typedef SQL_STRUCTURE db2gColumnData
{
unsigned char *piColumnName;
db2Uint16 iColumnNameLen;
db2int16 iColumnFlags;
} db2gColumnData;
typedef SQL_STRUCTURE db2gColumnDistData
{
unsigned char *piColumnName;
db2Uint16 iColumnNameLen;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2gColumnDistData;
typedef SQL_STRUCTURE db2gColumnGrpData
{
unsigned char **piGroupColumnNames;
db2Uint16 *piGroupColumnNamesLen;
db2int16 iGroupSize;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2gColumnGrpData;
db2Runstats API parameters
- versionNumber
- Input. Specifies the version and release level of the structure passed as the second parameter data.
- data
- Input. A pointer to the db2RunstatsData structure.
- pSqlca
- Output. A pointer to the sqlca structure.
db2RunstatsData data structure parameters
- iSamplingOption
- Input. Indicates that statistics are to be collected on a sample of table or view data. iSamplingOption represents the size of the sample as a percentage P. This value must be a positive number that is less than or equal to 100, but may be between 1 and 0. For example, a value of 0.01 represents one one-hundredth of a percent, such that 1 row in 10 000 would be sampled, on average. A value of 0 or 100 will be treated by Db2® as if sampling was not specified, regardless of whether DB2RUNSTATS_SAMPLING_SYSTEM has been specified. A value greater than 100 or less than 0 will be treated by Db2 as an error (SQL1197N). The two possible types of sampling are BERNOULLI and SYSTEM. The sampling type specification is controlled by the indicated setting of DB2RUNSTATS_SAMPLING_SYSTEM in the iRunstatsFlags.
- piTablename
- Input. A pointer to the fully qualified name of the table or statistical view on which statistics are to be gathered. The name can be an alias. For row types, piTablename must be the name of the hierarchy's root table.
- piColumnList
- Input. An array of db2ColumnData elements.
Each element of this array is made up of two sub-elements:
- a string that represents the name of the column on which to collect statistics
- a flags field indicating statistic options for the column
- piColumnDistributionList
- Input. An array of db2ColumnDistData elements.
These elements are provided when collecting distribution statistics
on a particular column or columns is required. Each element of this
array is made up of three sub-elements:
- a string that represents the name of the column on which to collect distribution statistics
- the number of frequent values to collect.
- the number of quantiles to collect
Any columns which appear in the piColumnDistributionList that do NOT appear in the piColumnList, will have basic column statistics collected on them. This would be the same effect as having included these columns in the piColumnList in the first place. If iNumColdist is zero then piColumnDistributionList is ignored.
- piColumnGroupList
- Input. An array of db2ColumnGrpData elements.
These elements are provided when collecting column statistics on a
group of columns. That is, the values in each column of the group
for each row will be concatenated together and treated as a single
value. Each db2ColumnGrpData is made up of 3 integer
fields and an array of strings. The first integer field represents
the number of strings in the array of strings piGroupColumns.
Each string in this array contains one column name. For example, if
column combinations statistics are to be collected on column groups
(c1,c2) and on (c3,c4,c5) then there are 2 db2ColumnGrpData elements
in piGroupColumns.
The first db2ColumnGrpData element is as follows: piGroupSize = 2 and the array of strings contains 2 elements, namely, c1 and c2.
The second db2ColumnGrpData element is as follows: piGroupSize = 3 and the array of strings contains 3 elements, namely, c3, c4 and c5.
The second and the third integer fields represent the number of frequent values and the number of quantiles when collecting distribution statistics on column groups. This is not currently supported.
Any columns which appear in the piColumnGroupList that do NOT appear in the piColumnList, will have basic column statistics collected on them. This would be the same effect as having included these columns in the piColumnList in the first place. If iNumColGroups is zero then piColumnGroupList is ignored.
- piIndexList
- Input. An array of strings. Each string contains one fully qualified index name. If NumIndexes is zero then piIndexList is ignored.
- iRunstatsFlags
- Input. A bit mask field used to specify statistics options. Valid
values (defined in db2ApiDf header file, located
in the include directory) are:
- DB2RUNSTATS_ALL_COLUMNS
- Collect statistics on all columns of the table or statistical view. This option can be specified in combination with column, column distribution, column group or index structure lists. This is useful if you would like to collect statistics on all columns of the table or view but would like to provide statistics options for specific columns.
- DB2RUNSTATS_KEY_COLUMNS
- Collect statistics only on the columns that make up all the indexes defined on the table. This option cannot be used for statistical views. On tables, it can be specified in combination with column, column distribution, column group or index structure lists. This is useful if you would like to collect statistics on all key columns of the table but would also like to gather statistics for some non-key columns or would like to provide statistics options for specific key columns. XML type columns are, by definition, not key columns and will not be included for statistics collection when the iRunstatsFlags parameter is set to the value DB2RUNSTATS_KEY_COLUMNS.
- DB2RUNSTATS_DISTRIBUTION
- Collect distribution statistics. This option can only be used with DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS. When used with DB2RUNSTATS_ALL_COLUMNS, distribution statistics are gathered for all columns of the table or statistical view. When used with DB2RUNSTATS_KEY_COLUMNS, distribution statistics are gathered for all columns that make up all the indexes defined on the table. When used with both DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS, basic statistics are gathered for all columns of the table and distribution statistics are gathered for only columns that make up all the indexes defined on the table.
- DB2RUNSTATS_ALL_INDEXES
- Collect statistics on all indexes defined on the table. This option cannot be used for statistical views.
- DB2RUNSTATS_EXT_INDEX
- Collect detailed index statistics. The option must be specified with either DB2RUNSTATS_ALL_INDEXES or an explicit list of index names (piIndexList and iNumIndexes > 0). This option cannot be used for statistical views.
- DB2RUNSTATS_EXT_INDEX_SAMPLED
- Collect detailed index statistics using sampling methods. The option must be specified with either DB2RUNSTATS_ALL_INDEXES or an explicit list of index names (piIndexList and iNumIndexes > 0). DB2RUNSTATS_EXT_INDEX will be ignored if specified at the same time. This option cannot be used for statistical views.
- DB2RUNSTATS_ALLOW_READ
- Allows others to have read-only access while the statistics are being gathered. The default is to allow read and write access.
- DB2RUNSTATS_SAMPLING_SYSTEM
- Collect statistics on a percentage of the data pages as specified by the user via the
iSamplingOption parameter. SYSTEM sampling considers each page individually,
including that page with probability P/100 (where P is the
value of iSamplingOption) and excluding it with probability
1-P/100. Thus, if iSamplingOption is the value
10, representing a 10 percent sample, each page would be included with probability
0.1 and be excluded with probability 0.9.
For statistical views, SYSTEM sampling can only be applied to a single base table referenced in the view definition. If the view contains multiple tables, SYSTEM sampling is possible if a single table among all the tables in the statistical view can be identified as being joined with all primary keys or unique index columns of the other tables used in the view. If the statistical view does not meet those conditions, BERNOULLI sampling will be used instead and a warning will be returned.
If DB2RUNSTATS_SAMPLING_SYSTEM is not specified, Db2 will assume that BERNOULLI sampling is to be used as the sampling method. BERNOULLI sampling considers each row individually, including that row with probability P/100 (where P is the value of iSamplingOption) and excluding it with probability 1-P/100.
In both SYSTEM and BERNOULLI sampling, unless the DB2RUNSTATS_SAMPLING_REPEAT flag is specified, each execution of statistics collection will usually yield a different sample of the table or statistical view.
- DB2RUNSTATS_SAMPLING_REPEAT
- Specifies that a seed has been passed through the iSamplingRepeatable parameter. The iSamplingRepeatable value will be used as the seed to generate the data sample. The iSamplingOption parameter must also be specified to indicate the sampling rate.
- DB2RUNSTATS_USE_PROFILE
- Collect statistics for a table or statistical view by using a statistics profile already registered in the catalogs of the table or view. If the USE PROFILE option is specified by this flag set in iRunstatsFlags bit mask, all other options in db2RunstatsData will be ignored.
- DB2RUNSTATS_SET_PROFILE
- Generate and store a profile in the catalogs recording the statistics options specified and collect statistics using those same options.
- DB2RUNSTATS_SET_PROFILE_ONLY
- Generate and store a profile in the catalogs recording the statistics options specified without actually collecting statistics for the table or view.
- DB2RUNSTATS_UNSET_PROFILE
- Unsetting a statistics profile will remove the statistics profile from the system catalogs by setting the SYSCAT.STATISTICS_PROFILE to NULL. If a statistics profile does not exist, attempting to unset it will result in an error (SQLCODE -2315).
- DB2RUNSTATS_UPDATE_PROFILE
- Modify an existing statistics profile in the catalogs and collect statistics using the options from the updated profile.
- DB2RUNSTATS_UPDA_PROFILE_ONLY
- Modify an existing statistics profile in the catalogs without actually collecting statistics for the table or view.
- DB2RUNSTATS_EXCLUDING_XML
- Do not collect statistics on XML type columns. Statistics will still be collected on all specified columns that have non-XML type. This option takes precedence over all other methods that specify XML columns.
- DB2RUNSTATS_INDEX_SYSTEM
- Collects the index statistics on a percentage of the index pages as specified by the user via the iIndexSamplingOption parameter. SYSTEM sampling considers each page individually, including the index pages with probability P/100 (where P is the value of iIndexSamplingOption) and excluding it with probability 1-P/100. Thus, if iIndexSamplingOption is the value 10, representing a 10 percent sample, each page would be included with probability 0.1 and be excluded with probability 0.9. If DB2RUNSTATS_INDEX_SYSTEM is not specified, Db2 will assume that BERNOULLI sampling is to be used as the sampling method. BERNOULLI sampling considers each row individually, including that row with probability P/100 (where P is the value of iIndexSamplingOption) and excluding it with probability 1-P/100.
- iNumColumns
- Input. The number of items specified in the piColumnList list.
- iNumColdist
- Input. The number of items specified in the piColumnDistributionList list.
- iNumColGroups
- Input. The number of items specified in the piColumnGroupList list.
- iNumIndexes
- Input. The number of items specified in the piIndexList list.
- iParallelismOption
- Input. Reserved for future use. Valid value is 0.
- iTableDefaultFreqValues
- Input. Specifies the default number of frequent values to collect
for the table or view. Valid values are:
- n
- n frequent values will be collected unless otherwise specified at the column level.
- 0
- No frequent values will be collected unless otherwise specified at the column level.
- -1
- Use the default database configuration parameter num_freqvalues for the number of frequent values to collect.
- iTableDefaultQuantiles
- Input. Specifies the default number of quantiles to collect for
the table or view. Valid values are:
- n
- n quantiles will be collected unless otherwise specified at the column level.
- 0
- No quantiles will be collected unless otherwise specified at the column level.
- -1
- Use the default database configuration parameter num_quantiles for the number of quantiles to collect.
- iSamplingRepeatable
- Input. A non-negative integer representing the seed to be used
in table or view sampling. Passing a negative seed will result in
an error (SQL1197N).
The DB2RUNSTATS_SAMPLING_REPEAT flag must be set to use this seed. This option is used in conjunction with the iSamplingOption parameter to generate the same sample of data in subsequent statistics collection. The sample set may still vary between repeatable requests if activity against the table or view resulted in changes to the table or view data since the last time a repeatable request was run. Also, the method by which the sample was obtained (BERNOULLI or SYSTEM) must also be the same to ensure consistent results.
- iUtilImpactPriority
- Input. Priority for the RUNSTATS invocation. Valid values must fall in the range 0-100, with 70 representing unthrottled and 100 representing the highest possible priority. This option cannot be used for statistical views.
- iIndexSamplingOption
- Input. Indicates that statistics are to be collected on a sample of the index data. iIndexSamplingOption represents the size of the sample as a percentage P. This value must be a positive number that is less than or equal to 100, but can be between 0 and 1. For example, a value of 0.01 represents one hundredth of one percent, such that one index page in 10 000 is sampled on average. A value of 0 or 100 is treated by Db2 as if sampling was not specified, regardless of whether DB2RUNSTATS_INDEX_SYSTEM has been specified. A value greater than 100 or less than 0 is treated by Db2 as an error (SQL1197N). The two possible types of sampling are BERNOULLI and SYSTEM. The sampling type specification is controlled by the indicated setting of DB2RUNSTATS_INDEX_SYSTEM in iRunstatsFlags. If the flag is not set, BERNOULLI sampling is assumed.
db2ColumnData data structure parameters
- piColumnName
- Input. Pointer to a string representing a column name.
- iColumnFlags
- Input. A bit mask field used to specify statistics options for
the column. Valid values are:
- DB2RUNSTATS_COLUMN_LIKE_STATS
- Collect LIKE statistics on the column.
db2ColumnDistData data structure parameters
- piColumnName
- Input. Pointer to a string representing a column name.
- iNumFreqValues
- Input. The number of frequent values to collect on the column.
Valid values are:
- n
- Collect n frequent values on the column.
- -1
- Use the table default number of frequent values, such as iTableDefaultFreqValues if set, or the database configuration parameter num_freqvalues.
- iNumQuantiles
- Input. The number of quantiles to collect on the column. Valid
values are:
- n
- Collect n quantiles on the column.
- -1
- Use the table default number of quantiles, iTableDefaultQuantiles if set, or the database configuration parameter num_quantiles.
db2ColumnGrpData data structure parameters
- piGroupColumnNames
- Input. An array of strings. Each string represents a column name that is part of the column group on which to collect statistics.
- iGroupSize
- Input. Number of columns in the column group. Valid values are:
- n
- The column group is made up of n columns.
- iNumFreqValues
- Input. Reserved for future use.
- iNumQuantiles
- Input. Reserved for future use.
db2gRunstatsData data structure specific parameters
- piIndexNamesLen
- Input. An array of values representing the length in bytes of each of the index names in the index list. If NumIndexes is zero then piIndexNamesLen is ignored.
- iTablenameLen
- Input. A value representing the length in bytes of the table or view name.
db2gColumnData data structure specific parameters
- iColumnNameLen
- Input. A value representing the length in bytes of the column name.
db2gColumnDistData data structure specific parameters
- iColumnNameLen
- Input. A value representing the length in bytes of the column name.
db2gColumnGrpData data structure specific parameters
- piGroupColumnNamesLen
- Input. An array of values representing the length in bytes of each of the column names in the column names list.
Usage notes
- On tables that have been modified many times (for example, if a large number of updates have been made, or if a significant amount of data has been inserted or deleted)
- On tables that have been reorganized
- When a new index has been created.
- On views whose underlying tables have been modified substantially so as to change the rows that are returned by the view.
After statistics have been updated, new access paths to the table can be created by rebinding the packages using sqlabndx - Bind.
If index statistics are requested, and statistics have never been run on the table containing the index, statistics on both the table and indexes are calculated.
If the db2Runstats API is collecting statistics on indexes only then previously collected distribution statistics are retained. Otherwise, the API will drop previously collected distribution statistics. If the db2Runstats API is collecting statistics on XML columns only, then previously collected basic column statistics and distribution statistics are retained. In the case where statistics on some XML columns have been collected previously, the previously collected statistics for an XML column will either be dropped if no statistics on that XML column are collected by the current call to the db2Runstats API, or be replaced if statistics on that XML column are collected by the current call to the db2Runstats API. Otherwise, the API will drop previously collected distribution statistics.
If the iRunstatsFlags parameter is set to the value DB2RUNSTATS_EXCLUDING_XML, statistics will not be collected on XML columns. This value takes precedence over all other methods that specify XML columns.
- Distribution statistics are collected for each index over XML data specified on an XML column.
- To collect distribution statistics for an index over XML data, both distribution statistics and
table statistics must be collected. Table statistics must be gathered in order for distribution
statistics to be collected because XML distribution statistics are stored with table statistics.
Collecting index statistics is not required to collect XML distribution statistics. Collecting index statistics without collecting distribution statistics does not collect XML distribution statistics.
By default, XML distribution statistics use a maximum of 250 quantiles.
- Distribution statistics are collected for indexes over XML data of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics are not collected for indexes over XML data of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes over XML data defined on a partitioned table.
After calling this API, the application should issue a COMMIT to release the locks.
To allow new access plans to be generated, the packages that reference the target table must be rebound after calling this API. Packages that contain queries that can take advantage of statistical views must also be rebound after updating statistics on such views.
When statistics are collected for statistical views, an SQL query is run internally. The EXPLAIN facility can be used to examine the access plan selected for this query to investigate any performance problems with the statistics collection. To save the query access plan in the EXPLAIN tables, set the CURRENT EXPLAIN MODE special register to YES.
Running this API on the table only may result in a situation where the table level statistics are inconsistent with the already existing index level statistics. For example, if index level statistics are collected on a particular table and later a significant number of rows is deleted from this table, issuing this API on the table only may end up with the table cardinality less than FIRSTKEYCARD (FIRSTKEYCARD is a catalog statistics field in SYSCAT.INDEXES and SYSSTAT.INDEXES catalog views) which is an inconsistent state. Likewise, issuing this API for indexes only may leave the already existing table level statistics in an inconsistent state. For example, if table level statistics are collected on a particular table and later a significant number of rows is deleted from this table, issuing the db2Runstats API for the indexes only may end up with some columns having a COLCARD (COLCARD is a catalog statistics field in SYSCAT.COLUMNS and SYSSTAT.COLUMNS catalog views) greater than the table cardinality. A warning will be returned if such an inconsistency is detected.
Statistics are not collected for columns with structured types. If they are specified, columns with these data types are ignored.
Only AVGCOLLEN and NUMNULLS are collected for columns with LOB or LONG data types.