The DB2® optimizer
uses catalog statistics to determine the most efficient access plan
for a query. With automatic statistics collection, part of the DB2 automated table maintenance
feature, you can let the database manager determine whether statistics
must be updated.
Instead of using automatic statistic collection, you can
collect statistics manually. However, deciding which statistics to
collect for a workload is complex, and keeping these statistics up-to-date
is time consuming. Out-of-date or incomplete table or index statistics
might lead the optimizer to select a suboptimal plan, which slows
down query execution.
Automatic statistics collection can occur
in two ways:
- For synchronous collection at statement compilation time, you
can use the real-time statistics (RTS) feature. The auto_stmt_stats database
configuration parameter is used for RTS statistics collection.
- For asynchronous collection, you can enable the RUNSTATS command
to run in the background. The auto_runstats database
configuration parameter is used for automatic background statistics
collection.
Both of these parameters are enabled by default when you create
a database. Although background statistics collection can be enabled
while real-time statistics collection is disabled, background statistics
collection is always enabled when real-time statistics collection
occurs.
You can use the Configuration Advisor
to determine the initial configuration for databases, including the
appropriate settings for various database configuration parameter.
In IBM® Data Studio Version
3.1 or later, you can use the task assistant for configuring
automatic statistics collection. Task assistants can guide you through the
process of setting options, reviewing the automatically generated
commands to perform the task, and running these commands. For more
details, see Administering databases with task assistants.In IBM Data Studio Version 3.1 or later,
you can use the task assistant to configure automatic statistics collection.
Task assistants can guide you through the process of setting options,
reviewing the automatically generated commands to perform the task,
and running these commands. For more details, see Administering databases
with task assistants.
Understanding
asynchronous and real-time statistics collection
The
query optimizer determines how synchronous or asynchronous statistics
are collected, based on the needs of the query and the amount of table
update activity (the number of update, insert, or delete operations).
You
can enable real-time statistics collection, so that statistics can
be fabricated by using certain metadata. Fabrication means
deriving or creating statistics, rather than collecting them as part
of normal RUNSTATS command activity. For example,
the number of rows in a table can be derived from knowing the number
of pages in the table, the page size, and the average row width. In
some cases, statistics are not derived but are maintained by the index
and data manager and can be stored in the catalog. For example, the
index manager maintains a count of the number of leaf pages and levels
in each index.
Real-time
statistics collection provides more timely and more accurate statistics
than asynchronous statistics collection. Accurate statistics can result
in better query execution plans and improved performance. Regardless
of whether you enable real-time statistics collection, asynchronous
statistics collection occurs at 2-hour intervals. This interval might
not be frequent enough to provide accurate statistics for some applications.
Real-time or synchronous statistics collection also initiates asynchronous
collection requests in the following cases:
- Synchronous statistics collection is used for sampling because
the table is large.
- Synchronous statistics were fabricated.
- Synchronous statistics collection failed because the collection
time is exceeded.
In addition, table activity might be high enough to require asynchronous
collection, but might not be high enough to require synchronous statistics
collection.
At
most, two asynchronous requests can be processed at the same time,
and only for different tables. One request must have been initiated
by real-time statistics collection, and the other must have been initiated
by asynchronous statistics collection.
The performance impact
of automatic statistics collection is minimized in several ways:
- Asynchronous statistics collection is performed by using a throttled RUNSTATS utility.
Throttling controls the amount of resource that is consumed by the RUNSTATS utility,
based on current database activity. As database activity increases,
the utility runs more slowly, reducing its resource demands.
- Synchronous statistics collection is limited to 5 seconds per
query. The RTS optimization guideline determines the amount of time.
If synchronous collection exceeds the time limit, an asynchronous
collection request is submitted.
- Synchronous statistics collection does not store the statistics
in the system catalog. Instead, the statistics are stored in a statistics
cache and are later stored in the system catalog by an asynchronous
operation. This storage sequence avoids the memory usage and possible
lock contention that are involved in updating the system catalog.
Statistics in the statistics cache are available for subsequent SQL
compilation requests.
- Only one synchronous statistics collection operation occurs per
table. Other requests requiring synchronous statistics collection
fabricate statistics, if possible, and continue with statement compilation.
This behavior is also enforced in a partitioned database environment,
where operations on different database partitions might require synchronous
statistics.
- Only tables with missing statistics or high levels of activity
(as measured by the number of update, insert, or delete operations)
are considered for statistics collection. Even if a table meets the
statistics collection criteria, statistics are not collected synchronously
unless query optimization requires them. In some cases, the query
optimizer can choose an access plan without statistics. To check if
asynchronous statistics collection is required, tables with more than
4000 pages are sampled to determine whether high table activity changed
the statistics. Statistics for such large tables are collected only
if warranted.
- Statistics collection during an online maintenance window depends
on whether the statistics are asynchronous or synchronous:
- For asynchronous statistics collection, the RUNSTATS utility
is automatically scheduled to run during the online maintenance window
that you specify in your maintenance policy. This policy also specifies
the set of tables that are within the scope of automatic statistics
collection, minimizing unnecessary resource consumption.
- Synchronous statistics collection and fabrication do not use the
online maintenance window that you specify in your maintenance policy,
because synchronous requests must occur immediately and have limited
collection time. Instead, synchronous statistics collection and fabrication
uses to the policy that specifies the set of tables that are within
the scope of automatic statistics collection.
- While automatic statistics collection is being performed, the
affected tables are still available for regular database activity
(update, insert, or delete operations).
- Synchronous
are not collected for the following objects:
- Real-time statistics are not collected
for statistical views.
- Real-time statistics are not collected for nicknames. To refresh
nickname statistics in the system catalog for synchronous statistics
collection, call the SYSPROC.NNSTAT procedure. For asynchronous statistics
collection, DB2 for Linux, UNIX, and Windows software
automatically calls the SYSPROC.NNSAT procedure to refresh the nickname
statistics in the system catalog.
- Declared
temporary tables (DGTTs) can have only real-time statistics collected.
Although real-time statistics collection is designed to
minimize statistics collection memory usage, try it in a test environment
first to ensure that there is no negative performance impact. There
might be a negative performance impact in some online transaction
processing (OLTP) scenarios, especially if there is a limit on how
long a query can run.
Real-time synchronous statistics collection is performed
for regular tables, materialized query tables (MQTs), and global temporary
tables. Asynchronous
statistics are not collected for global temporary tables. Global temporary
tables cannot be excluded from real-time statistics via the automatic
maintenance policy facility.
Automatic statistics collection
(synchronous or asynchronous) does not occur for the following objects:
- Tables that are marked VOLATILE (tables that have the VOLATILE
field set in the SYSCAT.TABLES catalog view).
- Created temporary tables (CGTTs).
- Tables
for which you manually updated statistics by issuing UPDATE statements
against SYSSTAT catalog views, including manual updates of expression-based
key column statistics for any expression-based indexes that the table
has, even though those statistics are in a separate statistical view.
When
you modify table statistics manually, the database manager assumes
that you are now responsible for maintaining the statistics. To induce
the database manager to maintain statistics for a table with manually
updated statistics, collect the statistics by using the RUNSTATS command,
or specify statistics collection when using the LOAD command.
Tables that you created before Version 9.5 and for which you manually
updated statistics before upgrading are not affected by this limitation.
Their statistics are automatically maintained by the database manager
until you manually update them.
Statistics fabrication does not occur for the following
objects:
- Statistical views
- Tables for which you manually updated statistics by issuing UPDATE
statements
against SYSSTAT catalog views. If real-time
statistics collection is not enabled, some statistics fabrication
still occurs for tables for which you manually updated statistics.
In a partitioned database environment, statistics are
collected on a single database partition and then extrapolated. The
database manager always collects statistics (both RTS and automatic
background statistics) on the first database partition of the database
partition group.
No real-time statistics collection occurs until
at least 5 minutes after database activation.
Real-time statistics
processing occurs for both static and dynamic SQL.
A table
that you truncated, either by using the TRUNCATE statement or by using
the IMPORT command, is automatically recognized
as having out-of-date statistics.
Automatic statistics collection,
both synchronous and asynchronous, invalidates cached dynamic statements
that reference tables for which statistics were collected. This invalidation
is done so that cached dynamic statements can be re-optimized with
the latest statistics.
Asynchronous automatic
statistics collection operations might be interrupted when the database
is deactivated. If you did not explicitly activate the database by
using the ACTIVATE DATABASE command or the sqle_activate_db API,
the database is deactivated when the last user disconnects from the
database. If operations are interrupted, error messages might be recorded
in the DB2 diagnostic log file.
To avoid interrupting asynchronous automatic statistics collection
operations, explicitly activate the database.
If
a table has expression-based indexes, statistics for the expression-based
key columns are collected and cached as part of statistics collection
for the table. Statistics are not fabricated for expression-based
key columns
Real-time statistics and explain processing
There
is no real-time processing
for a query that is only explained (not executed) by the EXPLAIN facility.
The following table summarizes the behavior under different values
of the CURRENT EXPLAIN MODE special register.
Table 1. Real-time statistics collection as a function of the value
of the CURRENT EXPLAIN MODE special registerCURRENT EXPLAIN MODE special register value |
Real-time statistics collection considered |
YES |
Yes |
EXPLAIN |
No |
NO |
Yes |
REOPT |
Yes |
RECOMMEND INDEXES |
No |
EVALUATE INDEXES |
No |
Automatic statistics collection and the statistics
cache
You
can use the statistics cached to make synchronously collected statistics
available to all queries. This cache is part of the catalog cache.
In a partitioned database environment, the statistics cache is on
only the catalog database partition even though each database partition
has a catalog cache. If you enable real-time statistics collection,
catalog cache requirements are higher. Consider tuning the value of
the catalogcache_sz database configuration parameter
if you enable real-time statistics collection.
Automatic statistics collection and statistical profiles
You
can customize the type of statistics that are collected by creating
your own statistics profile for a particular table. For details, see
collecting statistics using a statistics profile.
RTS and automatic
background statistics are collected according to a statistical profile
that is in effect for a table, with the following exceptions:
- To minimize the memory usage of synchronous statistics collection,
the database manager might collect statistics by using sampling. In
this case, the sampling rate and method might be different from those
rates and methods that you specified in the statistical profile.
- RTS collection might fabricate statistics, but it might not be
possible to fabricate all types of statistics that you specified in
the statistical profile. For example, column statistics such as COLCARD,
HIGH2KEY, and LOW2KEY cannot be fabricated unless the column is the
primary column in some index.
If RTS statistics collection cannot gather all the statistics
that you specified in the statistical profile, an asynchronous collection
request is submitted.
The
following sections explain different operating characteristics of
automatic statistics collection.