Updates statistics about the characteristics
of a table or any associated indexes or statistical views. This command
can also update statistics about the characteristics of both a table
and any associated indexes or statistical views.
These
characteristics include, among many others, number of records, number
of pages, and average record length. The optimizer uses these statistics
when determining access paths to the data.
When used on tables, this utility should be called when
a table has had many updates, after reorganizing a table, or after
creating a new index.
Statistics are based on the portion of the table that
resides on the database partition where the API executes. Global table
statistics are derived by multiplying the values obtained at a database
partition by the number of database partitions on which the table
is completely stored. The global statistics are stored in the catalog
tables. The database partition from which the API is called does not
have to contain a portion of the table:
- If the API is called from a database partition that contains a
portion of the table, the utility executes at this database partition.
- If the API is called from a database partition that does not contain
a portion of the table, the request is sent to the first database
partition in the database partition group that contains a portion
of the table. The utility then executes at this database partition.
When you collect statistics for a statistical view, statistics are
collected for all database partitions.
When used on statistical views, this utility should be
called when changes to underlying tables have substantially affected
the rows returned by a view. These views must have been enabled for
use in query optimization using "ALTER VIEW ... ENABLE QUERY OPTIMIZATION."
Scope
This API can be called from any database
partition server in the db2nodes.cfg file. It
can be used to update the catalogs on the catalog database partition.
Authorization
When used on tables, one of
the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the table
- LOAD
When used on statistical views, one of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the view
In addition, the user needs to have the appropriate authority
or privilege to access rows from the view. Specifically, for each
table, view or nickname referenced in the view definition, the user
must have one of the following authorities or privileges:
- DATAACCESS
- CONTROL privilege
- SELECT privilege
Required connection
Database
API include file
db2ApiDf.h
API and data structure syntax
SQL_API_RC SQL_API_FN
db2Runstats (
db2Uint32 versionNumber,
void * data,
struct sqlca * pSqlca);
typedef SQL_STRUCTURE db2RunstatsData
{
double iSamplingOption;
unsigned char *piTablename;
struct db2ColumnData **piColumnList;
struct db2ColumnDistData **piColumnDistributionList;
struct db2ColumnGrpData **piColumnGroupList;
unsigned char **piIndexList;
db2Uint32 iRunstatsFlags;
db2int16 iNumColumns;
db2int16 iNumColdist;
db2int16 iNumColGroups;
db2int16 iNumIndexes;
db2int16 iParallelismOption;
db2int16 iTableDefaultFreqValues;
db2int16 iTableDefaultQuantiles;
db2Uint32 iSamplingRepeatable;
db2Uint32 iUtilImpactPriority;
double iIndexSamplingOption;
} db2RunstatsData;
typedef SQL_STRUCTURE db2ColumnData
{
unsigned char *piColumnName;
db2int16 iColumnFlags;
} db2ColumnData;
typedef SQL_STRUCTURE db2ColumnDistData
{
unsigned char *piColumnName;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2ColumnDistData;
typedef SQL_STRUCTURE db2ColumnGrpData
{
unsigned char **piGroupColumnNames;
db2int16 iGroupSize;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2ColumnGrpData;
SQL_API_RC SQL_API_FN
db2gRunstats (
db2Uint32 versionNumber,
void * data,
struct sqlca * pSqlca);
typedef SQL_STRUCTURE db2gRunstatsData
{
double iSamplingOption;
unsigned char *piTablename;
struct db2gColumnData **piColumnList;
struct db2gColumnDistData **piColumnDistributionList;
struct db2gColumnGrpData **piColumnGroupList;
unsigned char **piIndexList;
db2Uint16 *piIndexNamesLen;
db2Uint32 iRunstatsFlags;
db2Uint16 iTablenameLen;
db2int16 iNumColumns;
db2int16 iNumColdist;
db2int16 iNumColGroups;
db2int16 iNumIndexes;
db2int16 iParallelismOption;
db2int16 iTableDefaultFreqValues;
db2int16 iTableDefaultQuantiles;
db2Uint32 iSamplingRepeatable;
db2Uint32 iUtilImpactPriority;
} db2gRunstatsData;
typedef SQL_STRUCTURE db2gColumnData
{
unsigned char *piColumnName;
db2Uint16 iColumnNameLen;
db2int16 iColumnFlags;
} db2gColumnData;
typedef SQL_STRUCTURE db2gColumnDistData
{
unsigned char *piColumnName;
db2Uint16 iColumnNameLen;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2gColumnDistData;
typedef SQL_STRUCTURE db2gColumnGrpData
{
unsigned char **piGroupColumnNames;
db2Uint16 *piGroupColumnNamesLen;
db2int16 iGroupSize;
db2int16 iNumFreqValues;
db2int16 iNumQuantiles;
} db2gColumnGrpData;
db2Runstats API parameters
- versionNumber
- Input. Specifies the version and release level of the structure
passed as the second parameter data.
- data
- Input. A pointer to the db2RunstatsData structure.
- pSqlca
- Output. A pointer to the sqlca structure.
db2RunstatsData data structure parameters
- iSamplingOption
- Input. Indicates that statistics are to be collected on a sample
of table or view data. iSamplingOption represents
the size of the sample as a percentage P. This value must be a positive
number that is less than or equal to 100, but may
be between 1 and 0. For example,
a value of 0.01 represents one one-hundredth of a
percent, such that 1 row in 10 000 would be sampled, on average. A
value of 0 or 100 will be treated
by DB2® as if sampling was not
specified, regardless of whether DB2RUNSTATS_SAMPLING_SYSTEM has been
specified. A value greater than 100 or less than 0 will
be treated by DB2 as an error
(SQL1197N). The two possible types of sampling are
BERNOULLI and SYSTEM. The sampling type specification is controlled
by the indicated setting of DB2RUNSTATS_SAMPLING_SYSTEM in
the iRunstatsFlags.
- piTablename
- Input. A pointer to the fully qualified name of the table or statistical
view on which statistics are to be gathered. The name can be an alias.
For row types, piTablename must be the name of
the hierarchy's root table.
- piColumnList
- Input. An array of db2ColumnData elements.
Each element of this array is made up of two sub-elements:
- a string that represents the name of the column on which to collect
statistics
- a flags field indicating statistic options for the column
If iNumColumns is zero then piColumnList is
ignored if provided.
- piColumnDistributionList
- Input. An array of db2ColumnDistData elements.
These elements are provided when collecting distribution statistics
on a particular column or columns is required. Each element of this
array is made up of three sub-elements:
- a string that represents the name of the column on which to collect
distribution statistics
- the number of frequent values to collect.
- the number of quantiles to collect
Any columns which appear in the piColumnDistributionList that
do NOT appear in the piColumnList, will have
basic column statistics collected on them. This would be the same
effect as having included these columns in the piColumnList in
the first place. If iNumColdist is zero then piColumnDistributionList is
ignored.
- piColumnGroupList
- Input. An array of db2ColumnGrpData elements.
These elements are provided when collecting column statistics on a
group of columns. That is, the values in each column of the group
for each row will be concatenated together and treated as a single
value. Each db2ColumnGrpData is made up of 3 integer
fields and an array of strings. The first integer field represents
the number of strings in the array of strings piGroupColumns.
Each string in this array contains one column name. For example, if
column combinations statistics are to be collected on column groups
(c1,c2) and on (c3,c4,c5) then there are 2 db2ColumnGrpData elements
in piGroupColumns.
The first db2ColumnGrpData element
is as follows: piGroupSize = 2 and
the array of strings contains 2 elements, namely, c1 and c2.
The
second db2ColumnGrpData element is as follows: piGroupSize = 3 and
the array of strings contains 3 elements, namely, c3, c4 and c5.
The
second and the third integer fields represent the number of frequent
values and the number of quantiles when collecting distribution statistics
on column groups. This is not currently supported.
Any columns
which appear in the piColumnGroupList that do
NOT appear in the piColumnList, will have basic
column statistics collected on them. This would be the same effect
as having included these columns in the piColumnList in
the first place. If iNumColGroups is zero then piColumnGroupList is
ignored.
- piIndexList
- Input. An array of strings. Each string contains one fully qualified
index name. If NumIndexes is zero then piIndexList is
ignored.
- iRunstatsFlags
- Input. A bit mask field used to specify statistics options. Valid
values (defined in db2ApiDf header file, located
in the include directory) are:
- DB2RUNSTATS_ALL_COLUMNS
- Collect statistics on all columns of the table or statistical
view. This option can be specified in combination with column, column
distribution, column group or index structure lists. This is useful
if you would like to collect statistics on all columns of the table
or view but would like to provide statistics options for specific
columns.
- DB2RUNSTATS_KEY_COLUMNS
- Collect statistics only on the columns that make up all the indexes
defined on the table. This option cannot be used for statistical views.
On tables, it can be specified in combination with column, column
distribution, column group or index structure lists. This is useful
if you would like to collect statistics on all key columns of the
table but would also like to gather statistics for some non-key columns
or would like to provide statistics options for specific key columns.
XML type columns are, by definition, not key columns and will not
be included for statistics collection when the iRunstatsFlags parameter
is set to the value DB2RUNSTATS_KEY_COLUMNS.
- DB2RUNSTATS_DISTRIBUTION
- Collect distribution statistics. This option can only be used
with DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS.
When used with DB2RUNSTATS_ALL_COLUMNS, distribution
statistics are gathered for all columns of the table or statistical
view. When used with DB2RUNSTATS_KEY_COLUMNS, distribution
statistics are gathered for all columns that make up all the indexes
defined on the table. When used with both DB2RUNSTATS_ALL_COLUMNS and DB2RUNSTATS_KEY_COLUMNS,
basic statistics are gathered for all columns of the table and distribution
statistics are gathered for only columns that make up all the indexes
defined on the table.
- DB2RUNSTATS_ALL_INDEXES
- Collect statistics on all indexes defined on the table. This option
cannot be used for statistical views.
- DB2RUNSTATS_EXT_INDEX
- Collect detailed index statistics. The option must be specified
with either DB2RUNSTATS_ALL_INDEXES or an explicit
list of index names (piIndexList and iNumIndexes > 0).
This option cannot be used for statistical views.
- DB2RUNSTATS_EXT_INDEX_SAMPLED
- Collect detailed index statistics using sampling methods. The
option must be specified with either DB2RUNSTATS_ALL_INDEXES or
an explicit list of index names (piIndexList and iNumIndexes > 0). DB2RUNSTATS_EXT_INDEX will
be ignored if specified at the same time. This option cannot be used
for statistical views.
- DB2RUNSTATS_ALLOW_READ
- Allows others to have read-only access while the statistics are
being gathered. The default is to allow read and write access.
- DB2RUNSTATS_SAMPLING_SYSTEM
- Collect statistics on a percentage of the data pages as specified
by the user via the iSamplingOption parameter.
SYSTEM sampling considers each page individually, including that page
with probability P/100 (where P is
the value of iSamplingOption) and excluding it
with probability 1-P/100. Thus, if iSamplingOption is
the value 10, representing a 10 percent sample, each
page would be included with probability 0.1 and be excluded with probability
0.9.
For statistical views, SYSTEM
sampling can only be applied to a single base table referenced in
the view definition. If the view contains multiple tables, SYSTEM
sampling is possible if a single table among all the tables in the
statistical view can be identified as being joined with all primary
keys or unique index columns of the other tables used in the view.
If the statistical view does not meet those conditions, BERNOULLI
sampling will be used instead and a warning will be returned.
If DB2RUNSTATS_SAMPLING_SYSTEM is
not specified, DB2 will assume
that BERNOULLI sampling is to be used as the sampling method. BERNOULLI
sampling considers each row individually, including that row with
probability P/100 (where P is
the value of iSamplingOption) and excluding it
with probability 1-P/100.
In both SYSTEM
and BERNOULLI sampling, unless the DB2RUNSTATS_SAMPLING_REPEAT flag
is specified, each execution of statistics collection will usually
yield a different sample of the table or statistical view.
- DB2RUNSTATS_SAMPLING_REPEAT
- Specifies that a seed has been passed through the iSamplingRepeatable parameter.
The iSamplingRepeatable value will be used as
the seed to generate the data sample. The iSamplingOption parameter
must also be specified to indicate the sampling rate.
- DB2RUNSTATS_USE_PROFILE
- Collect statistics for a table or statistical view by using a
statistics profile already registered in the catalogs of the table
or view. If the USE PROFILE option is specified by this flag set in iRunstatsFlags bit
mask, all other options in db2RunstatsData will
be ignored.
- DB2RUNSTATS_SET_PROFILE
- Generate and store a profile in the catalogs recording the statistics
options specified and collect statistics using those same options.
- DB2RUNSTATS_SET_PROFILE_ONLY
- Generate and store a profile in the catalogs recording the statistics
options specified without actually collecting statistics for the table
or view.
- DB2RUNSTATS_UNSET_PROFILE
- Unsetting a statistics profile will remove the statistics profile
from the system catalogs by setting the SYSCAT.STATISTICS_PROFILE
to NULL. If a statistics profile does not exist, attempting to unset
it will result in an error (SQLCODE -2315).
- DB2RUNSTATS_UPDATE_PROFILE
- Modify an existing statistics profile in the catalogs and collect
statistics using the options from the updated profile.
- DB2RUNSTATS_UPDA_PROFILE_ONLY
- Modify an existing statistics profile in the catalogs without
actually collecting statistics for the table or view.
- DB2RUNSTATS_EXCLUDING_XML
- Do not collect statistics on XML type columns. Statistics will
still be collected on all specified columns that have non-XML type.
This option takes precedence over all other methods that specify
XML columns.
- DB2RUNSTATS_INDEX_SYSTEM
- Collects the index statistics on a percentage of the index pages
as specified by the user via the iIndexSamplingOption parameter.
SYSTEM sampling considers each page individually, including the index
pages with probability P/100 (where P is
the value of iIndexSamplingOption) and excluding
it with probability 1-P/100. Thus, if iIndexSamplingOption is
the value 10, representing a 10 percent sample, each
page would be included with probability 0.1 and be excluded with probability
0.9. If DB2RUNSTATS_INDEX_SYSTEM is not specified, DB2 will assume that BERNOULLI sampling
is to be used as the sampling method. BERNOULLI sampling considers
each row individually, including that row with probability P/100
(where P is the value of iIndexSamplingOption)
and excluding it with probability 1-P/100.
- iNumColumns
- Input. The number of items specified in the piColumnList list.
- iNumColdist
- Input. The number of items specified in the piColumnDistributionList list.
- iNumColGroups
- Input. The number of items specified in the piColumnGroupList list.
- iNumIndexes
- Input. The number of items specified in the piIndexList list.
- iParallelismOption
- Input. Reserved for future use. Valid value is 0.
- iTableDefaultFreqValues
- Input. Specifies the default number of frequent values to collect
for the table or view. Valid values are:
- n
- n frequent values will be collected unless otherwise specified
at the column level.
- 0
- No frequent values will be collected unless otherwise specified
at the column level.
- -1
- Use the default database configuration parameter num_freqvalues for
the number of frequent values to collect.
- iTableDefaultQuantiles
- Input. Specifies the default number of quantiles to collect for
the table or view. Valid values are:
- n
- n quantiles will be collected unless otherwise
specified at the column level.
- 0
- No quantiles will be collected unless otherwise specified at the
column level.
- -1
- Use the default database configuration parameter num_quantiles for
the number of quantiles to collect.
- iSamplingRepeatable
- Input. A non-negative integer representing the seed to be used
in table or view sampling. Passing a negative seed will result in
an error (SQL1197N).
The DB2RUNSTATS_SAMPLING_REPEAT flag
must be set to use this seed. This option is used in conjunction with
the iSamplingOption parameter to generate the
same sample of data in subsequent statistics collection. The sample
set may still vary between repeatable requests if activity against
the table or view resulted in changes to the table or view data since
the last time a repeatable request was run. Also, the method by which
the sample was obtained (BERNOULLI or SYSTEM) must also be the same
to ensure consistent results.
- iUtilImpactPriority
- Input. Priority for the RUNSTATS invocation.
Valid values must fall in the range 0-100,
with 70 representing unthrottled and 100 representing
the highest possible priority. This option cannot be used for statistical
views.
- iIndexSamplingOption
- Input. Indicates that statistics are to be collected on a sample
of the index data. iIndexSamplingOption represents
the size of the sample as a percentage P. This
value must be a positive number that is less than or equal to 100,
but can be between 0 and 1. For
example, a value of 0.01 represents one hundredth
of one percent, such that one index page in 10 000 is sampled on average.
A value of 0 or 100 is treated by DB2 as if sampling was not specified,
regardless of whether DB2RUNSTATS_INDEX_SYSTEM has
been specified. A value greater than 100 or less than 0 is treated
by DB2 as an error (SQL1197N).
The two possible types of sampling are BERNOULLI and SYSTEM. The sampling
type specification is controlled by the indicated setting of DB2RUNSTATS_INDEX_SYSTEM in iRunstatsFlags.
If the flag is not set, BERNOULLI sampling is assumed.
db2ColumnData data structure parameters
- piColumnName
- Input. Pointer to a string representing a column name.
- iColumnFlags
- Input. A bit mask field used to specify statistics options for
the column. Valid values are:
- DB2RUNSTATS_COLUMN_LIKE_STATS
- Collect LIKE statistics on the column.
db2ColumnDistData data structure parameters
- piColumnName
- Input. Pointer to a string representing a column name.
- iNumFreqValues
- Input. The number of frequent values to collect on the column.
Valid values are:
- n
- Collect n frequent values on the column.
- -1
- Use the table default number of frequent values, such as iTableDefaultFreqValues if
set, or the database configuration parameter num_freqvalues.
- iNumQuantiles
- Input. The number of quantiles to collect on the column. Valid
values are:
- n
- Collect n quantiles on the column.
- -1
- Use the table default number of quantiles, iTableDefaultQuantiles if
set, or the database configuration parameter num_quantiles.
db2ColumnGrpData data structure parameters
- piGroupColumnNames
- Input. An array of strings. Each string represents a column name
that is part of the column group on which to collect statistics.
- iGroupSize
- Input. Number of columns in the column group. Valid values are:
- n
- The column group is made up of n columns.
- iNumFreqValues
- Input. Reserved for future use.
- iNumQuantiles
- Input. Reserved for future use.
db2gRunstatsData data structure specific parameters
- piIndexNamesLen
- Input. An array of values representing the length in bytes of
each of the index names in the index list. If NumIndexes is
zero then piIndexNamesLen is ignored.
- iTablenameLen
- Input. A value representing the length in bytes of the table or
view name.
db2gColumnData data structure specific parameters
- iColumnNameLen
- Input. A value representing the length in bytes of the column
name.
db2gColumnDistData data structure specific parameters
- iColumnNameLen
- Input. A value representing the length in bytes of the column
name.
db2gColumnGrpData data structure specific parameters
- piGroupColumnNamesLen
- Input. An array of values representing the length in bytes of
each of the column names in the column names list.
Usage notes
Use
db2Runstats to
update statistics:
- On tables that have been modified many times (for example, if
a large number of updates have been made, or if a significant amount
of data has been inserted or deleted)
- On tables that have been reorganized
- When a new index has been created.
- On views whose underlying tables have been modified substantially
so as to change the rows that are returned by the view.
After statistics have been updated, new access paths to
the table can be created by rebinding the packages using sqlabndx -
Bind.
If index statistics are requested, and statistics have
never been run on the table containing the index, statistics on both
the table and indexes are calculated.
If the db2Runstats API
is collecting statistics on indexes only then previously collected
distribution statistics are retained. Otherwise, the API will drop
previously collected distribution statistics. If the db2Runstats API
is collecting statistics on XML columns only, then previously collected
basic column statistics and distribution statistics are retained.
In the case where statistics on some XML columns have been collected
previously, the previously collected statistics for an XML column
will either be dropped if no statistics on that XML column are collected
by the current call to the db2Runstats API, or
be replaced if statistics on that XML column are collected by the
current call to the db2Runstats API. Otherwise,
the API will drop previously collected distribution statistics.
If
the iRunstatsFlags parameter is set to the value DB2RUNSTATS_EXCLUDING_XML,
statistics will not be collected on XML columns. This value takes
precedence over all other methods that specify XML columns.
For DB2 V9.7
Fix Pack 1 and later releases, the following items apply for the collection
of distribution statistics for a column of type XML:
- Distribution statistics are collected for each index over XML
data specified on an XML column.
- To collect distribution statistics for an index over XML data,
both distribution statistics and table statistics must be collected.
Table statistics must be gathered in order for distribution statistics
to be collected because XML distribution statistics are stored with
table statistics.
Collecting index statistics is not required to
collect XML distribution statistics. Collecting index statistics without
collecting distribution statistics does not collect XML distribution
statistics.
By default, XML distribution statistics use a maximum
of 250 quantiles.
- Distribution statistics are collected for indexes over XML data
of type VARCHAR, DOUBLE, TIMESTAMP, and DATE. Distribution statistics
are not collected for indexes over XML data of type VARCHAR HASHED.
- Distribution statistics are not collected for partitioned indexes
over XML data defined on a partitioned table.
After calling
this API, the application should issue a COMMIT to release the locks.
To
allow new access plans to be generated, the packages that reference
the target table must be rebound after calling this API. Packages
that contain queries that can take advantage of statistical views
must also be rebound after updating statistics on such views.
When
statistics are collected for statistical views, an SQL query is run
internally. The EXPLAIN facility can be used to examine the access
plan selected for this query to investigate any performance problems
with the statistics collection. To save the query access plan in the
EXPLAIN tables, set the CURRENT EXPLAIN MODE special register to YES.
Running
this API on the table only may result in a situation where the table
level statistics are inconsistent with the already existing index
level statistics. For example, if index level statistics are collected
on a particular table and later a significant number of rows is deleted
from this table, issuing this API on the table only may end up with
the table cardinality less than FIRSTKEYCARD (FIRSTKEYCARD is a catalog
statistics field in SYSCAT.INDEXES and SYSSTAT.INDEXES catalog views)
which is an inconsistent state. Likewise, issuing this API for indexes
only may leave the already existing table level statistics in an inconsistent
state. For example, if table level statistics are collected on a particular
table and later a significant number of rows is deleted from this
table, issuing the db2Runstats API for the indexes
only may end up with some columns having a COLCARD (COLCARD is a catalog
statistics field in SYSCAT.COLUMNS and SYSSTAT.COLUMNS catalog views)
greater than the table cardinality. A warning will be returned if
such an inconsistency is detected.
Statistics
are not collected for columns with structured types. If they are specified,
columns with these data types are ignored.
Only
AVGCOLLEN and NUMNULLS are collected for columns with LOB or LONG
data types.
AVGCOLLEN represents the average
space in bytes when the column is stored in database memory or a temporary
table. This value represents the length of the data descriptor for
LOB or LONG data types, except when LOB data is inlined on the data
page.
Note: The average space required to store the column on disk
may be different than the value represented by this statistic.