Automatic table maintenance in DB2, Part 1: Automatic statistics collection in DB2 for Linux, UNIX, and Windows

How does auto-runstats actually work?

DB2's autonomic capabilities lighten the load of the DBA and enable you to maximize performance. This article demonstrates how to enable and configure automatic statistics collection, monitor automatic statistics collection progress, and detect errors. It also provides an overview of automatic statistics collection processing to help you understand how and when automatic statistics collection determines when to collect statistics on a table. This article is the first of a two-part series on autonomic table maintenance in DB2.

Ivan Popivanov (ivannp@ca.ibm.com), Software Developer, EMC

Ivan PopivanovIvan Popivanov is a Software Engineer in the DB2 Optimizer and RUNSTATS team, having previously worked in the Autonomic Computing team where was the lead developer of Auto-Runstats. Ivan has five years of experience with IBM. He has previously co-authored two conference papers and several patent applications. Ivan holds degrees in Computer Science from the University of Toronto and from Sofia University (Bulgaria).



Scott Walkty (swalkty@ca.ibm.com), Software Developer, EMC

Scott WalktyScott Walkty is a Software Developer on the DB2 Workload Management Team. He previously worked for five years on the DB2 Tools Team ,where he was one of the developers of Automatic Table Reorganization. Scott holds a Masters degree in Computer Science from the University of Manitoba.



Angela Yang (ayyang@ca.ibm.com), Software Developer, EMC

Angela YangAngela Yang is a Software Developer at the IBM Toronto Lab. For the past 3 years she has worked in several areas of DB2 development including administration tools and runstats. She is currently working on DB2 Query Optimizer development team.



Beck Tang (becktang@ca.ibm.com), Software Developer, IBM

Beck TangBeck Tang has six years of experience as a member of the IBM SAP Integration and Support Center at the IBM Toronto Lab. His current role includes certification testing of SAP NetWeaver with IBM DB2 UDB and assisting customers with problem analysis and troubleshooting. Mr. Tang is also a customer advocate providing custom-tailored support for large customer accounts running SAP and DB2 UDB. In addition, he is a DB2 Certified Solution Expert. He is a co-author of the red book, "SAP Solutions on IBM DB2 UDB V8.2.2 Handbook" and white paper, "SAP/DB2 and High Availability on Sun Cluster 3.X". Mr. Tang can be reached at becktang@ca.ibm.com.



07 June 2007

Also available in Russian

Introduction

IBM DB2 e-kit for Database Professionals

Learn how easy it is to get trained and certified for DB2 for Linux, UNIX, and Windows with the IBM DB2 e-kit for Database Professionals. Register now, and expand your skills portfolio, or extend your DBMS vendor support to include DB2.

The DB2 optimizer uses catalog statistics to determine the most efficient access plan for any given query. Having out-of-date or incomplete statistics for a table or an index can lead the optimizer to select a plan that is not optimal, slowing down query execution. However, deciding which statistics to collect for a given workload is complex, and keeping these statistics up-to-date is time-consuming.

Automatic statistics collection, also known as auto-runstats, was introduced in DB2 Version 8.2 as part of DB2's automated table maintenance feature. With automatic statistics collection, you can let DB2 determine whether database statistics need to be updated. DB2 will automatically run the RUNSTATS utility in the background to ensure the most current database statistics are available.

This article provides an in-depth look into the automatic statistics collection feature. After reading this article, you will be able to:

  • Set up and configure automatic statistics collection
  • Understand in depth how automatic statistics collection works, and in particular, how automatic statistics collection determines that table statistics need to be updated
  • Monitor automatic statistics collection progress and detect any errors that occur during the automation.

Read on, or link directly to the section that interests you:


Enabling automatic statistics collection

Automatic statistics collection is enabled and disabled using the AUTO_RUNSTATS database configuration parameter, which is part of a hierarchy of automatic maintenance database configuration parameters. These parameters can be viewed using the GET DB CONFIG command line processor (CLP) command. For example:

Listing 1. Database configuration output
GET DB CONFIG
...
 Automatic maintenance                      (AUTO_MAINT) = ON
   Automatic database backup            (AUTO_DB_BACKUP) = OFF
   Automatic table maintenance          (AUTO_TBL_MAINT) = ON
   Automatic runstats              (AUTO_RUNSTATS)= ON
     Automatic statistics profiling    (AUTO_STATS_PROF) = OFF
       Automatic profile updates         (AUTO_PROF_UPD) = OFF
     Automatic reorganization               (AUTO_REORG) = OFF

This hierarchy (shown by the indentation of the parameters in the output from this command) allows a DBA to enable or disable automatic table maintenance features selectively or as a group. For example, if AUTO_MAINT is set to OFF, all other automatic maintenance parameters are considered to have an effective value of OFF, regardless their actual value. Similarly, if AUTO_TBL_MAINT is set to OFF, the AUTO_RUNSTATS, AUTOSTATS_PROF, AUTO_PROF_UPD and AUTO_REORG parameters are considered to have an effective value of OFF. To enable automatic statistics collection, the AUTO_MAINT, AUTO_TBL_MAINT and AUTO_RUNSTATS parameters must all have a value of ON.

You can enable automatic statistics collection by issuing the following command:

CONNECT TO <db alias>
UPDATE DB CONFIG USING AUTO_MAINT ON AUTO_TBL_MAINT ON AUTO_RUNSTATS ON

Again, if the database partitioning feature is enabled, this command must be issued on the catalog partition.

Starting in DB2 9, automatic statistics collection is enabled by default when a new database is created.

To check whether or not AUTO_RUNSTATS is enabled, use the GET DB CONFIG command as shown above. It is important that you always check the values of each of the AUTO_MAINT, AUTO_TBL_MAINT and AUTO_RUNSTATS configuration parameters. Looking at the value of AUTO_RUNSTATS alone is not sufficient.

Configuring automatic statistics collection using the DB2 Control Center

Automatic statistics collection behavior is controlled through a maintenance policy. This maintenance policy specifies:

  • An on-line maintenance window. This is a window of time during which automatic statistics collection will perform statistics collection on the qualified tables.
  • The set of tables that will be considered for automatic statistics collection. Tables can be selected by name, schema or comment.

The maintenance policy can only be updated using the DB2 Control Center graphical interface. There is no command line or programmatic interface.

By default, there is a 24x7 online maintenance window, and all the tables are considered. To change the default maintenance policy, perform the following steps.

  1. Launch the DB2 Control Center.

    From the command line, type the following command:

    db2cc
  2. Open the Configure automatic maintenance wizard.

    In the Control Center main window, expand the navigation tree on the left hand side and open the databases folder. Right-click on the database for which you are configuring automatic maintenance, and select the Configure Automatic Maintenance menu option from the pop up menu as shown in Figure 1.

    Figure 1. Launching configure automatic maintenance wizard
    Launching Configure automatic maintenance wizard
  3. The first panel in the configure automatic maintenance wizard provides a brief overview of automatic maintenance. Click the Next button at the bottom of the window to continue.
    Figure 2. Configure automatic maintenance wizard: Configure automatic maintenance
    Configure automatic maintenance Wizard: Configure automatic maintenance
  4. The operation type panel allows you to choose whether you are using the wizard to configure automatic maintenance, or disable automatic maintenance. For configuration, leave the radio button set to Change automation settings and click on the Next button. Note that disabling automation settings will simply update the database configuration parameters discussed in the previous section, setting the values to OFF.
    Figure 3. Configure automatic maintenance wizard: Select automatic maintenance type
    Configure automatic maintenance wizard: Select automatic maintenance type
  5. Configure the maintenance window panel.

    Use the configure maintenance window panel to specify windows of time for automatic maintenance operations. A DB2 database can have at most two windows specified: An online window (for maintenance operations that leave the object of the maintenance accessible) and an offline window (during which time the object undergoing maintenance will be inaccessible to users). It is not necessary to define an offline maintenance window for automatic statistics collection. Automatic statistics collection will occur only during the online maintenance window.

    Note that maintenance windows are used to specify the start time of automatic maintenance operations. Any operations that run past the end of a window are allowed to proceed to completion. For example, if a maintenance window is 7:00 PM to 5:00 AM every weekday, and a table runstats is started at 4:30 AM, the runstats will not be terminated if it is still executing at 5:00 AM. It will be allowed to finish.

    By default, all DB2 databases have an online window defined that is 24x7 (that is, all hours of all days). This is usually sufficient for automatic statistics collection because the table remains accessible to users while runstats takes place and runstats can be throttled, so the impact on overall system performance can be controlled. To modify the online window, select the Change button next to the Online maintenance window description.

    Figure 4. Configure automatic maintenance wizard: Specify when automatic maintenance activities can run
    Configure automatic maintenance wizard: Wizard: Specify when automatic maintenance activities can run

    This will launch the maintenance window specification dialog. Use this dialog to define a maintenance window that meets your requirements. Typically the default 24x7 online window will be sufficient.

    Figure 5. Configure automatic maintenance wizard: Change maintenance window specification - Online activities
    Configure automatic maintenance wizard: Change maintenance window specification - Online Activities

    When you have completed defining the maintenance window, close the specification dialog and click the Next button on the maintenance window panel.

  6. Configure the notification list panel.

    The notification list panel is used to specify a set of e-mail addresses for users who should receive e-mail notification whenever any health indicators (including the health indicators that monitor statistics collection behavior) enter an alert-able state. This step is optional. You can select Next immediately to skip this panel.

    Before you configure e-mail notification, perform the following steps:

    1. Install the DB2 Administration Server. This topic is outside the scope of this article. Refer to the DB2 documentation for details.
    2. Update the DB2 Administration Server SMTP_SERVER configuration parameter. This can be done using the UPDATE ADMIN CONFIG CLP command. For example:
      UPDATE ADMIN CONFIG USING SMTP_SERVER myserver.domain.com

      The SMTP_SERVER configuration parameter must point to an unauthenticated SMTP server (authenticated SMTP is not supported by the DB2 Administration Server).

    Assuming the DB2 Administration Server has been installed and the SMTP_SERVER configuration parameter has been configured, you can add e-mail addresses on the notification list panel. Select the Manage Contacts button to create new contacts, which are added to a master contact list stored by the DB2 Administration Server. Then, choose addresses from this contact list and select the right arrow button to add these contacts to the list of users that will receive health notification e-mails from the health monitor. When you are finished, click the Next button.

    Figure 5. Configure automatic maintenance wizard: Manage your notification list
    Configure automatic maintenance wizard: Manage your notification list
  7. Configure the maintenance activities panel.

    The maintenance activities panel allows you to enable or disable the automatic statistics collection health indicator and configure the behavior of Automatic statistics collection.

    The automate column corresponds to the AUTO_RUNSTATS configuration parameter. If this column has a check mark, AUTO_MAINT, AUTO_TBL_MAINT and AUTO_RUNSTATS are all set to ON (that is, the effective value of AUTO_RUNSTATS is ON). If this column is not checked, check it to enable automatic statistics collection.

    The Notify column corresponds to the db.tb_runstats_req health indicator. This health indicator monitors the progress of automatic statistics collection. If there is a check mark, the health indicator is enabled. If there is no check mark, the health indicator is disabled. This column has nothing to do with e-mail notifications. It simply controls whether or not statistics collection will be monitored. If you are enabling automatic statistics collection, it is strongly encouraged that you enable the health indicator (that is, leave the Notify column checked). The health indicator is the primary mechanism for monitoring behavior, detecting problems and performing problem determination. The Monitoring section describes the health indicator in more detail.

    Figure 7. Configure automatic maintenance wizard: Select a maintenance activity to configure
    Configure automatic maintenance wizard: Select a maintenance activity to configure

    The lower portion of the panel describes the settings for automatic statistics collection. Click the Configure Settings button to launch the settings dialog.

    The settings dialog defines the table scope. Table scope identifies which tables will be considered for automatic statistics collection. By default, all tables are considered for automatic statistics collection.

    Figure 8. Configure automatic maintenance wizard: Optimize data access 1
    Configure automatic maintenance wizard: Optimize data access 1

    You can uncheck include system table. This will prevent automatic statistics collection from considering system tables.

    Figure 9. Configure automatic maintenance wizard: Optimize data access 2
    Configure automatic maintenance wizard: Optimize Data Access 2

    After completing all the configurations for automatic statistics collection on the activities panel, click the Next button to advance to the summary panel.

  8. Review the summary panel.

    The summary panel provides an overview of all the configuration changes that are being made. Review the changes and, if you are satisfied, click the Finish button. The changes are applied to the database. You have successfully enabled and configured automatic statistics collection.

    Figure 10. Configure automatic maintenance wizard: Review the automatic maintenance settings
    Configure automatic maintenance wizard: Review the automatic maintenance settings
  9. Perform advanced configuration.

    In the automatic statistics collection settings dialog, it is possible to filter tables by defining a WHERE clause on a select statement applied to SYSCAT.TABLES.

    Figure 11. Configure automatic maintenance wizard: Optimizer data access 3
    Configure automatic maintenance wizard: Optimizer Data Access 3

    This WHERE clause permits a great deal of flexibility in specifying which tables should be considered.

    One approach that can be useful is the following. Create a control table containing the names of all the tables that you want considered for automatic statistics collection:

    CREATE TABLE AUTO_RUNSTATS_CTL( TABSCHEMA VARCHAR(128), TABNAME VARCHAR(128))

    Populate this table with the names of all tables you wish to be considered for automatic statistics collection. Then define the following WHERE clause in the automatic statistics collection policy.

    (TABSCHEMA, TABNAME) IN 
        ( SELECT CTL.TABSCHEMA, CTL.TABNAME FROM AUTO_RUNSTATS_CTL AS CTL)

    Now only tables listed in the control table will be considered for automatic statistics collection. If you wish to omit a particular table for some period of time, simply remove it from your control table.

Configuring the statistics heap STAT_HEAP_SZ

The database configuration parameter STAT_HEAP_SZ determines the maximum amount of memory that will be used when RUNSTATS is performed. This limit applies whether RUNSTATS is issued manually or driven by automatic statistics collection. Although the default value (set when a database is created) is sufficient in many cases, it is usually too low for wide tables (tables with many columns). Whenever a RUNSTATS command fails to complete due to a low setting of STAT_HEAP_SZ, sqlcode SQL0973N is generated. When automatic statistics collection fails to perform RUNSTATS on a table due to insufficient statistics heap memory, the following logpoints will be written to the db2diag.log.

Listing 2. Logpoints in db2diag.log showing insufficient memory for RUNSTATS
2007-03-23-23.49.33.244659-240 I1689777A368       LEVEL: Event
PID     : 1413486              TID  : 1286        PROC : db2acd
INSTANCE: popivan              NODE : 000
APPID   : *LOCAL.popivan.070324034939
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:500
START   : Automatic Runstats: runstats has started on table "SYSIBM  "."SYSTABLES"

2007-03-23-23.49.33.246227-240 I1690146A941       LEVEL: Error
PID     : 807008               TID  : 1           PROC : db2agent (A)
INSTANCE: popivan              NODE : 000         DB   : A
APPHDL  : 0-18                 APPID: *LOCAL.popivan.070324034939
AUTHID  : POPIVAN 
FUNCTION: DB2 UDB, relation data serv, sqlr_init_tstat, probe:160
MESSAGE : ZRC=0x8B120006=-1961754618=SQLR_STATS_HEAP_TOO_SMALL
          "Statistics heap size too small to begin with"
          DIA8328C No memory available in the statistics heap.
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
 sqlcaid : SQLCA     sqlcabc: 136   sqlcode: 0   sqlerrml: 0
 sqlerrmc: 
 sqlerrp : SQL09010
 sqlerrd : (1) 0x00000000      (2) 0x00000000      (3) 0x00000000
           (4) 0x00000000      (5) 0x00000000      (6) 0x00000000
 sqlwarn : (1)      (2)      (3)      (4)        (5)       (6)    
           (7)      (8)      (9)      (10)        (11)     
 sqlstate:      

2007-03-23-23.49.33.251552-240 I1691088A492       LEVEL: Error
PID     : 1413486              TID  : 1286        PROC : db2acd
INSTANCE: popivan              NODE : 000
APPID   : *LOCAL.popivan.070324034939
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:600
MESSAGE : ZRC=0xFFFFFC33=-973
DATA #1 : <preformatted>
AutoStats: Non zero code from Runstats
SQL0973N  Not enough storage is available in the "STAT_HEAP_SZ" heap to process 
the statement.  SQLSTATE=57011

2007-03-23-23.49.33.252357-240 I1691581A378       LEVEL: Event
PID     : 1413486              TID  : 1286        PROC : db2acd
INSTANCE: popivan              NODE : 000
APPID   : *LOCAL.popivan.070324034939
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:900

It is a good practice to increase STAT_HEAP_SZ preventively. From our own experience, a value around 15,000 will provide enough memory for almost all cases. Alternatively, you can adjust STAT_HEAP_SZ database configuration parameter by running a manual RUNSTATS against the widest table (table with the most number of columns). The error messages returned by the manual RUNSTATS contains the minimum value of STAT_HEAP_SZ which will allow RUNSTATS to complete on this table.


How does auto-runstats actually work (V8/V9)?

The UDI counter

When should table statistics be updated? In general, this is a complex question, but there are a few simple scenarios. Consider a scenario where, for a particular table, all relevant statistics (for example distribution statistics) have been collected last week, and there have been no changes to the table data since this collection took place. Do you think we need to update the statistics? Obviously the answer is "no." Running RUNSTATS will only lead to collecting the same statistics again.

Now let's say that for the same table we know that all rows were deleted since the last statistics collection a week ago. Do you think we need to update statistics? This time the answer is probably "yes," because otherwise DB2 will have a completely wrong picture of the table -- it will think the table is as large as it was a week ago, while in fact it is empty.

As these two cases illustrate, an important factor to consider is how much the table data has changed since the last time the statistics were updated. To quantify this data change, DB2 maintains a counter for each table that is increased every time a row is updated, deleted or inserted. This counter is known as the UDI counter. The term UDI is also used when an update or a delete or an insert was performed, but it is not important which of the three.

DB2 resets the UDI counter at the end of each RUNSTATS. So the UDI counter is a reliable measure how much data has been changed since the last RUNSTATS.

The current value of the UDI counter can be checked by using the db2pd tool.

db2pd -db <dbName> -tcbstats

Sample output from this command is listed below. From the second section, TCB Table Stats, you can see UDI counter on tables T1 and T2:

Listing 3. Output from db2pd command
Database Partition 0 -- Database JITSDB -- Active -- Up 0 days 05:10:19

TCB Table Information:
Address            TbspaceID TableID PartID MasterTbs MasterTab TableName
0x0000002AD24FCED8 3         2       n/a    3         2         T1 
0x0000002AD24FBA58 3         3       n/a    3         3         T2

TCB Table Stats:
Address            TableName          Scans      UDI        
0x0000002AD24FCED8 T1                 1          3      
0x0000002AD24FBA58 T2                 1          3

Persistence of UDI Counter Value

The UDI counter value is hardened to disk under a few specific circumstances as described below.

Prior to DB2 Version 9.5, the conditions are as follows:

  • A new page is added to the table. New pages may be added as rows are inserted or added with the IMPORT command. Load will also update the on disk UDI counter; as can a table reorg, classic reorg, and certain inplace table reorgs.
  • The automatic statistics collection process requests the UDI counter.
  • After a statistics collection resulting from either the RUNSTATS command or the automatic statistics collection process.

Since DB2 Version 9.5, there is one more condition:

  • If a table has a ROW CHANGE TIMESTAMP column, meta data for the row change time column will get periodically written to disk. When this occurs, the UDI counter value is written to disk also.

When automatic statistics collection is enabled, DB2 evaluates whether new statistics are required at 2 hour intervals. When this happens, the automatic statistics collection process makes a request for the UDI counter values to make its decision. At this point, the UDI counter value is written to disk. The periodic persistence of the UDI counter to disk ensures that automatic statistics collection makes accurate decisions across database activations and de-activations.

How often does automatic statistics collection process run?

Automatic statistics collection is a background process which runs periodically. We will use the term evaluation to refer to the processing that takes place when automatic statistics collection checks whether or not tables require statistics to be updated, and schedules runstats for tables whose statistics are out of date. The first evaluation occurs within two hours of database activation. Subsequent evaluations occur approximately every two hours after that, as long as the database remains active.

Determining which tables to collect statistics on

DB2 V8.2

The easiest way to understand automatic statistics collection is to think of it as a series of tests applied sequentially to each table. A test is performed only if the results of all previous tests were positive.

  1. Check if the table is due for evaluation. No further action performed if the table is not due for evaluation. An internal table is used to track if tables are due for evaluation.
  2. Check if the table has been accessed by the current workload.
  3. Check if table has statistics. If statistics hare never been collected for this table, issue RUNSTATS on the table. No further checks performed.
  4. Check whether UDI counter is greater than 10% of the rows. If not, no action on the table.
  5. Check whether UDI counter is greater than 50% of the rows, issue RUNSTATS on the table if UDI counter is greater than 50% of the rows.
  6. Issue RUNSTATS if the table is small.
  7. if table is large (more than 4000 pages), sample the table to decide whether or not to perform RUNSTATS.
Figure 12. Flowchart on determining which tables to collect statistics on DB2 V8.2
Flowchart on determining which tables to collect statistics on DB2 V8.2

** If this is the first time statistics have been collected for the table, sampling may be considered based on the table size. Otherwise, RUNSTATS in the above chart means a full RUNSTATS without sampling.

For each table, automatic statistics collection keeps a history based on the changes in the statistics. Notice that data changes and changes in the statistics are two different things -- data changes are necessary for statistics to be different. However, data changes do not guarantee that RUNSTATS will collect statistics much different from the current statistics. If automatic statistics collection detects a table has a lot statistics changes, it will schedule the table to be evaluated more frequent.

On the contrary, if a table has no statistics change, the table will be scheduled to be evaluated less frequently. In DB2 V8.2 this historical information maintained by automatic statistics collection is the initial factor to determine whether a particular table should be considered during an evaluation. If a table is not due for maintenance according to the historic information, automatic statistics collection will not attempt any further actions on this table.

Automatic statistics collection only collects statistics for tables that have been accessed at least once by the workload running on the database. In order for automatic statistics collection to consider a table, there must be at least one query (or some other database operation) executed against the table since the database was activated. For example, if you connect to a database which has automatic statistics collection enabled, but you don't run any queries, automatic statistics collection will not update the statistics for any table.

If the table has been accessed at least once, automatic statistics collection then checks to see if the table has any statistics. If the table has no statistics, automatic statistics collection updates the table statistics (by running RUNSTATS) and moves to the next table.

If the table has statistics, the next step is to check the UDI counter. Alone, the UDI counter has little meaning, hence, automatic statistics collection considers the ratio of UDI / Table Cardinality. This ratio is meaningful because by definition the UDI counter is the number of UDI operations since the last runstats, the moment in time when the Table Cardinality was computed. There are two important thresholds -- 10% and 50% (these thresholds may change in the future). If the higher threshold is exceeded, the table statistics are updated. On the other hand, if the amount of changes is less than the lower threshold (10%), the decision is that RUNSTATS is not required and automatic statistics collection moves on to the next table.

The behavior in the grey area (between 10% and 50%) depends on the size of the table. Automatic statistics collection distinguishes large and small tables, defined by the current number of physical pages allocated for the table (not the value from SYSCAT.TABLES.FPAGES, which is updated whenever RUNSTATS is run). RUNSTATS is cheap for small tables, thus, if a table is small and there are more than 10% changes, RUNSTATS is run. For large tables, automatic statistics collection needs further proof that a statistics update is required. It samples a small part of the table and collects statistics over the sample. RUNSTATS is only run if the sampled statistics are much different from the table statistics.

DB2 9

The algorithm is the same with one exception. The V8.2 approach reacts too slowly to surprising (with respect to the historical information) changes in the tables. This is due to the fact that it looks first at the historical information. If a table hasn't changed for a very long time, the historical information suggests that automatic statistics collection needs to look at the table infrequently (every few weeks for example). Thus, it will take some time before automatic statistics collection looks at the table and realizes there was abundance of changes.

  1. Check if the table has been accessed by the current workload.
  2. Check if table has statistics. If statistics hare never been collected for this table, issue RUNSTATS on the table. No further checks performed.
  3. Check whether UDI counter is greater than 10% of the rows. If not, no action on the table.
  4. Check whether UDI counter is greater than 50% of the rows, issue RUNSTATS on the table if UDI counter is greater than 50% of the rows.
  5. Check if the table is due for evaluation. No further action performed if the table is not due for evaluation. An internal table is used to track if tables are due for evaluation.
  6. RUNSTATS if the table is small.
  7. if table is large (more than 4000 pages), sample the table to decide whether or not to perform RUNSTATS.
Figure 13. Flowchart on determining which tables to collect statistics on DB2 9
Flowchart on determining which tables to collect statistics on DB2 9

** If this is the first time statistics have been collected for the table, sampling may be considered based on the table size. Otherwise, RUNSTATS in the above chart means a full RUNSTATS without sampling.

Starting with DB2 9, automatic statistics collection tests whether UDI / Table Cardinality > 50% before considering the historical information. Thus, it is able to react quicker to massive table changes. RUNSTATS will be triggered within 2 hours after the UDI ratio exceeds this threshold.

Automatic statistics collection process model

The automatic statistics collection evaluation engine is a thread running in the Health Monitor process on the catalog node. The Health Monitor process was called db2hmon in DB2 V8.2, but was renamed to db2acd in DB2 9. There is one automatic statistics collection thread per database, and it maintains a database connection for the duration of the evaluation.

Automatic statistics collection in data partitioning feature (DPF) environments

RUNSTATS in a DPF environment:
For partitioned tables, RUNSTATS collects statistics only on one of the partitions. At the end, the collected statistics are extrapolated, assuming uniformity across partitions. For example, if there are 1000 pages with data (NPAGES) on the partition where RUNSTATS is performed, and if the table is spread across five partitions, then RUNSTATS will assume that the table has 5000 pages in total, and this will be the value stored in SYSCAT.TABLES.NPAGES column.

When RUNSTATS is issued, if the table exists on the coordinator partition (the partition to which the user is connected) then RUNSTATS will run on the coordinator partition. Otherwise, statistics will be collected on the first table partition.

Consider a DPF system with five partitions and a table distributed on three of them: partitions 2, 3 and 4. If the user connects to partition 3 and issues RUNSTATS on this table, then RUNSTATS will collect statistics using the table data on partition 3. However, if the user connects to partition 0, then RUNSTATS will collect statistics using the table data on partition 2.

Automatic statistics collection in a DPF environment:
The evaluation process for automatic statistics collection takes place on the catalog node. For every table in the database, the evaluation process needs to load the table UDI counter. In a DPF environment, the UDI counter is partitioned together with the table; that is, there is one UDI counter per table partition. So automatic statistics collection needs to read the UDI counter from every partition where the table exists. This is only possible if the database is active on these partitions. Automatic statistics collection will never activate a database.

Databases can be activated either explicitly (using the ACTIVATE DATABASE command, which activates the database on all partitions in a DPF environment) or implicitly (database activated on first connection, the database is only active on partitions to which applications have connected). To maximize the benefit from automatic statistics collection in a DPF environment, the database should be explicitly activated. This can be done using the following command:

ACTIVATE DATABASE <db alias>

DB2 V8.2

In DB2 V8.2 automatic statistics collection will log the following error whenever it attempts to check the UDI counter for a table that is defined on many partitions and there is a partition on which the database is not active. These logpoints can fill the db2diag.log and waste a lot of space (particularly on a system with many tables). If you find these logpoints are filling your db2diag.log, consider activating your database explicitly.

Listing 4. Logpoints in db2diag.log after check for UDI counter showing partition where database is not active
2006-12-27-11.11.47.529451-360 I90873A383         LEVEL: Error
PID     : 3285062              TID  : 1           PROC : db2pdbc 3
INSTANCE: diedwi1              NODE : 003
FUNCTION: DB2 UDB, base sys utilities, sqleRunPdbSysCtlr, probe:67
MESSAGE : Request discarded by Controller at =
DATA #1 : Hexdump, 4 bytes
0x0FFFFFFFFFFF7F78 : 0000 000E                                  ....

DB2 9

No errors will be reported in db2diag.log if some of the table partitions are not activated.

RUNSTATS Profile

Setting up the RUNSTATS profile

The RUNSTATS utility provides an option to register and use a statistics profile. A statistics profile allows you to store the options that you specify when you issue the RUNSTATS command so that you can collect the same statistics repeatedly on a table without having to re-type the command options. The statistics profile is saved in the SYSCAT.TABLES catalog table.

For an example, you can use the following command to setup the profile.

RUNSTATS ON TABLE DB2USER.EMPLOYEE WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 
SET PROFILE

To use the profile:

RUNSTATS ON TABLE DB2USER.EMPLOYEE USE PROFILE

You can query the profile associated with table EMPLOYEE:

SELECT SUBSTR(STATISTICS_PROFILE, 1, 150) AS STATISTICS_PROFILE FROM SYSCAT.TABLES 
WHERE TABNAME = 'EMPLOYEE'

The above query will return the following result:

STATISTICS_PROFILE
-----------------------------------------------------------------------
RUNSTATS ON TABLE "DB2USER"."EMPLOYEE" ON ALL COLUMNS WITH DISTRIBUTION 
ON ALL COLUMNS DEFAULT NUM_FREQVALUES 50

How automatic statistics collection applies the profile

Automatic statistics collection respects the profile a user has specified by using the registered profile option in the SYSCAT.TABLES catalog table. If no profile is specified, automatic statistics collection uses the following options when collecting statistics:

RUNSTATS ON TABLE DB2USER.EMPLOYEE WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL

Throttling in RUNSTATS

DB2 utilities like BACKUP, REBALANCE and RUNSTATS are resource intensive. Running these utilities at the same time as your workload can severely impact overall database performance. The DB2 throttling facility limits the amount of resources consumed by utilities.

There are two steps involved to setup and apply throttling for utilities:

  • Step 1: configure instance impact policy configuration parameter:
    UPDATE DBM CFG USING UTIL_IMPACT_LIM 10

    This command specifies that throttled utilities will not impact the workload by more than 10 percent.

  • Step 2: invoke a utility with a non-zero priority. For an example:
    RUNSTATS ON TABLE DB2USER.EMPLOYEE WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 
    UTIL_IMPACT_PRIORITY 60

    The above command specifies the RUNSTATS command will be throttled with a priority of 60 (relative to other throttled utilities).

Throttling in automatic statistics collection

Throttling in automatic statistics collection is different than RUNSTATS throttling. Automatic statistics collection uses a fixed impact rate of 7 percent no matter how much UTIL_IMPACT_LIM is set. This way ensures automatic statistics collection does not have a significant impact on your workload even if it runs during production hours.

Figure 14. Throttling in automatic statistics collection
Throttling in Automatic Statistics Collection

Automatic statistics collection and volatile tables

SYSCAT.TABLES has a character column called volatile. The default value for this column is space. Currently, the only other supported value is 'C', which is a hint to DB2 that the table has volatile cardinality. In other words, DB2 should not rely on the cardinality statistics when choosing the best table access. For the same reason, automatic statistics collection does not collect statistics on volatile tables.


Monitoring

Automatic statistics collection looks for tables with stale statistics and either updates the statistics or only reports the table status to you through the status of db.tb_runstats_req health indicator. This section will explain how to check the progress of automatic statistics collection and how to detect any problems that occur.

Health snapshots and the statistics collection required health indicator

The primary interface for monitoring automatic statistics collection is with the health monitor and the db.tb_runstats_req health indicator. Health indicators measure the health of particular aspects of DB2. The db.tb_runstats_req health indicator enters an attention (non-normal) state whenever a table requires statistics collection. In the case where automatic statistics collection is enabled, this means that automatic statistics collection of a table has failed. The health indicator can also be useful when automatic statistics collection is not enabled, for detecting when tables need to be collected statistics.

The current value of a health indicator can be obtained using a health snapshot. For the purpose of this document, we'll consider health snapshots from the command line.

To take a health snapshot for a database, issue the following command from the db2 command line:

GET HEALTH SNAPSHOT FOR DATABASE ON <DB ALIAS>

Note that database health snapshot information will only be returned if the database is active. If the database is inactive, you will see warning SQL1611W returned.

Health snapshot output looks something like the following (note that only the db.tb_runstats_req health indicator is displayed here for clarity, other health indicators have been omitted).

Listing 5. Database health snapshot output
GET HEALTH SNAPSHOT FOR DB ON SAMPLE

                          Database Health Snapshot

Snapshot timestamp                         = 04/10/2007 12:18:00.918375

Database name                              = SAMPLE
Database path                              = /home/ivannp/ivannp/NODE0000/SQL00004/
Input database alias                       = SAMPLE
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
Database highest severity alert state      = Attention

Health Indicators:

    Indicator Name                             = db.tb_runstats_req
       Value                                   = 1
       Evaluation timestamp                    = 04/10/2007 12:17:44.662893
       Alert state                             = Attention

          Collection:

             Name                              = "DBUSER"."EMPLOYEE"
             Detail                            = RUNSTATS
             State                             = Automation failed
             Evaluation timestamp              = 04/10/2007 12:17:45.000000

The snapshot header describes when the snapshot was taken, and for which database. The highest severity alert state is an aggregate across all health indicators being evaluated for the database.

Listing 6. Database health snapshot output
Snapshot timestamp                         = 04/10/2007 12:18:00.918375

Database name                              = SAMPLE
Database path                              = /home/ivannp/ivannp/NODE0000/SQL00004/
Input database alias                       = SAMPLE
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
Database highest severity alert state      = Attention

The health indicators section displays information for all health indicators being evaluated on the database. For clarity, only the db.tb_runstats_req health indicator is shown here. The top level db.tb_runstats_req health indicator information describes the health of automatic statistics collection as a whole. The evaluation timestamp indicates the last time a statistics collection evaluation took place. Recall that evaluation occurs roughly every 2 hours (as long as the database remains active). The alert state for db.tb_runstats_req will take on one of two values: Normal or Attention. If the value is Normal, automatic statistics collection is functioning correctly. No tables need statistics collected. If the value is Attention, at least one table requires RUNSTATS.

Listing 7. Health snapshot output showing alert state "Attention"
Health Indicators:

    Indicator Name                             = db.tb_runstats_req
       Value                                   = 1
       Evaluation timestamp                    = 04/10/2007 12:17:44.662893
       Alert state                             = Attention

Following the high level health indicator information, there is one collection entry listed for each table that requires RUNSTATS. In the example shown here, automatic statistics collection failed on the table DBUSER.EMPLOYEE.

Listing 8. Health snapshot - showing failed statistics collection
Name                              = "DBUSER"."EMPLOYEE"
Detail                            = RUNSTATS
State                             = Automation failed
Evaluation timestamp              = 04/10/2007 12:17:45.000000

The state shows the current state for the table. For a regular health snapshot, this state will be automation failed or attention. Automation failed means that an error was encountered when RUNSTATS is running on the table. Attention means that the table requires statistics collection and automatic statistics collection did not attempt to collect statistics on the table (for example, the database configuration parameter, AUTO_RUNSTATS, is OFF).

The evaluation timestamp shows the last time this table was considered for statistics collection. Note that this timestamp will be different than the evaluation timestamp.

By default the health snapshot information for the db.tb_runstats_req health indicator only shows information for tables that require RUNSTATS (and are either scheduled for automatic statistics collection, or require manual RUNSTATS). Tables which have been checked and found to not require RUNSTATS are not displayed. Sometimes it is useful to gather information about these tables too. You can modify the health snapshot command to request that information be returned for all tables by adding the WITH FULL COLLECTION clause. For example:

GET HEALTH SNAPSHOT FOR DATABASE ON <DB ALIAS> WITH FULL COLLECTION

The output for the db.tb_runstats_req health indicator now displays one entry for each table that is considered as part of the automatic statistics collection scope. Similar to the entries for tables which require runstats, you can tell from this output the last time that a table was evaluated for RUNSTATS.

For illustrative purposes, it's simplest to consider a number of examples of the most common things you will see when taking a health snapshot.

Listing 9. Example 1: Database not active, no health snapshot information is returned
GET HEALTH SNAPSHOT FOR DB ON SAMPLE
SQL1611W  No data was returned by Database System Monitor

Example 2: Database is active, automatic statistics collection has not yet evaluated (i.e. database has not yet been active).

Listing 10. Example 2: Database is active, automatic statistics collection has not yet evaluated (for example, database has not yet been active)
GET HEALTH SNAPSHOT FOR DB ON SAMPLE

             Database Health Snapshot

Snapshot timestamp                         = 04/10/2007 11:39:30.733561

Database name                              = SAMPLE
Database path                              = /home/ivannp/ivannp/NODE0000/SQL00004/
Input database alias                       = SAMPLE
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
Database highest severity alert state      = Not yet evaluated

Health Indicators:

   Not yet evaluated
Listing 11. Example 3: Health snapshot, automatic statistics collection working properly, no manual RUNSTATS is required.
GET HEALTH SNAPSHOT FOR DB ON SAMPLE

             Database Health Snapshot

Snapshot timestamp                         = 04/10/2007 11:53:30.741181

Database name                              = SAMPLE
Database path                              = /home/ivannp/ivannp/NODE0000/SQL00004/
Input database alias                       = SAMPLE
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
Database highest severity alert state      = Normal

Health Indicators:

    Indicator Name                             = db.tb_runstats_req
       Value                                   = 0
       Evaluation timestamp                    = 04/10/2007 11:52:43.336809
       Alert state                             = Normal
Listing 12. Example 4: Health snapshot, automatic statistics collection failed
GET HEALTH SNAPSHOT FOR DB ON SAMPLE

                          Database Health Snapshot

Snapshot timestamp                         = 04/10/2007 12:18:00.918375

Database name                              = SAMPLE
Database path                              = /home/ivannp/ivannp/NODE0000/SQL00004/
Input database alias                       = SAMPLE
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
Database highest severity alert state      = Attention

Health Indicators:

    Indicator Name                             = db.tb_runstats_req
       Value                                   = 1
       Evaluation timestamp                    = 04/10/2007 12:17:44.662893
       Alert state                             = Attention

          Collection:

             Name                              = "DBUSER"."EMPLOYEE"
             Detail                            = RUNSTATS
             State                             = Automation failed
             Evaluation timestamp              = 04/10/2007 12:17:45.000000

As mentioned earlier, there are a number of other interfaces to snapshot information. These include the db2GetSnapshot C API, the Health Center graphical interface and the following set of SQL interfaces:

  • HEALTH_DB_HI - Information on health indicators.
  • HEALTH_DB_HIC - Information on individual tables.

Note that there is in SQL interface equivalent to the WITH FULL COLLECTION option.

To summarize, using the health snapshot, you can tell at a glance:

  • The last time automatic statistics collection evaluation took place.
  • The set of tables that require RUNSTATS.
  • The set of tables which do not require RUNSTATS (if the FULL COLLECTION option is specified).

Configuring the db.tb_runstats_req health indicator

By default, the health monitor and the db.tb_runstats_req health indicators are both enabled. You can check if the health monitor is enabled using the GET DBM CONFIG command.

Listing 13. Check dbm configuration to see if health monitor enabled
GET DBM CONFIG 

          Database Manager Configuration

     Node type = Database Server with local clients

 Database manager configuration release level            = 0x0c00
...

   Statement                              (DFT_MON_STMT) = OFF
   Table                                 (DFT_MON_TABLE) = OFF
   Timestamp                         (DFT_MON_TIMESTAMP) = ON
   Unit of work                            (DFT_MON_UOW) = OFF
Monitor health of instance and databases   (HEALTH_MON) = ON

 SYSADM group name                        (SYSADM_GROUP) =

If health monitor is disabled, you can enable it using the following command:

UPDATE DBM CONFIG USING HEALTH_MON ON

You can check if the db.tb_runstats_req health indicator is enabled using the GET ALERT CONFIG command.

Listing 14. Check the alert configuration to see if health indicator enabled
GET ALERT CONFIG FOR DATABASE ON <DB ALIAS>

            Alert Configuration

  Indicator Name                     = db.db_op_status
      Default                        = Yes     
      Type                           = State-based
      Sensitivity                    = 0
      Formula                        = db.db_status;
      Actions                        = Disabled
      Threshold or State checking    = Enabled
...

  Indicator Name                     = db.tb_runstats_req
      Default                        = Yes     
      Type                           = Collection state-based
      Sensitivity                    = 0
      Actions                        = Disabled
      Threshold or State checking    = Enabled

If the db.tb_runstats_req health indicator is disabled, you can enable it using the following command:

UPDATE ALERT CONFIG FOR DATABASE ON <DB ALIAS> USING db.tb_runstats_req 
SET THRESHOLDSCHECKED YES

Diagnostic logging

In DB2 V8.2 automatic statistics collection will output a start and a stop message to indicate it start time and end time, respectively. These messages are marked as events and show in db2diag.log regardless of the logging level (defined by the value of DIAGLEVEL database manager configuration parameter). These messages however do not appear in db2diag.log in DB2 v9 (more precisely they appear at DIAGLEVEL 4, which is rarely used).

Automatic statistics collection outputs a start and a stop message whenever a RUNSTATS is run against a table. Thus, an implicit way to see whether automatic statistics collection is ON is to look for any message in db2diag.log.

Listing 15. Check db2diag.log for RUNSTATS messages
2007-04-21-10.03.29.160856-240 I351795A369        LEVEL: Event
PID     : 332048               TID  : 1326        PROC : db2acd
INSTANCE: ayyang               NODE : 000
APPID   : *LOCAL.ayyang.070421140401
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:500
START   : Automatic Runstats: runstats has started on table "SYSIBM  "."SYSCONSTDEP"

2007-04-21-10.03.30.156007-240 I352165A380        LEVEL: Event
PID     : 332048               TID  : 1326        PROC : db2acd
INSTANCE: ayyang               NODE : 000
APPID   : *LOCAL.ayyang.070421140401
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:900
STOP    : Automatic Runstats: runstats completed successfully on table "SYSIBM  
"."SYSCONSTDEP"

Additional information

Common error and warning messages

Automatic statistics collection logs messages in db2diag.log when errors occur. These are generally not critical errors. If automatic statistics collection fails to collect statistics on a table for any reason, you can wait until the next evaluation and the collection will be retried. Alternatively you can issue the RUNSTATS command manually if you need the statistics for a table updated immediately.

The common errors/warnings include SQL2314W, SQL0911N and SQL0973N.

SQL2314W

Some statistics are in an inconsistent state. The newly collected "<object1>" statistics are inconsistent with the existing "<object2>" statistics.

The RUNSTATS command will log this warning if there is any inconsistence between index stats and table stats. You can refer to the message documentation for proper response. Or you can ignore it as evaluation is performed every 2 hours. The warning may not occur in the subsequent evaluations.

User response:

  1. Issue a RUNSTATS to collect both table level and index level statistics.
  2. Collect statistics when inserts, updates, and deletes are minimal or not occurring at all. Or, if concurrent inserts, updates, and deletes are not essential, issue a RUNSTATS with the ALLOW READ ACCESS option.
  3. Increase the sample size, or, if RUNSTATS was issued with TABLESAMPLE SYSTEM, use the TABLESAMPLE BERNOULLI option instead.

SQL0911N:

Listing 16. SQL0911N error message
2007-04-10-12.00.48.223887-240 I20011A505         LEVEL: Error
PID     : 1175872              TID  : 1060        PROC : db2acd
INSTANCE: ivannp               NODE : 000
APPID   : *LOCAL.ivannp.070410160045
FUNCTION: DB2 UDB, Automatic Table Maintenance, atmRefreshInfoTable, probe:300
MESSAGE : ZRC=0xFFFFFC71=-911
DATA #1 : <preformatted>
AutoStats: [IBM][CLI Driver][DB2/AIX64] SQL0911N  The current transaction has been rolled
 back because of a deadlock or timeout.  Reason code "68".  SQLSTATE=40001

This is new in DB2 9. While processing tables, automatic statistics collection may end up waiting for a lock. Automatic statistics collection evaluation runs with a lock timeout value of 5 seconds. If evaluation waits for more than 5 seconds to obtain a lock, it will encounter a -911 error. This error can be ignored as evaluation is performed every 2 hours, so a subsequent evaluation will perform the statistics collection.

SQL0973N:

Listing 17. SQL0973N error message
2007-03-23-23.49.33.251552-240 I1691088A492       LEVEL: Error
PID     : 1413486              TID  : 1286        PROC : db2acd
INSTANCE: popivan              NODE : 000
APPID   : *LOCAL.popivan.070324034939
FUNCTION: DB2 UDB, Automatic Table Maintenance, AtmTable::runstats, probe:600
MESSAGE : ZRC=0xFFFFFC33=-973
DATA #1 : <preformatted>
AutoStats: Non zero code from Runstats
SQL0973N  Not enough storage is available in the "STAT_HEAP_SZ" heap to process the 
statement.  SQLSTATE=57011

If you see this error, the statistics heap may be configured incorrectly. Refer to the section on Configuring the statistics heap for details.

Handling DDL statements

When RUNSTATS is issued from the command line, the target table cannot be dropped or altered until the RUNSTATS command completes. This behavior is different for automatic statistics collection. When automatic statistics is collecting statistics, the target table can still be dropped or altered. In this case, the automatic statistics collection connection to the database is forced so that the DDL statement can proceed.


Conclusion

This article has demonstrated how to enable and configure automatic statistics collection, monitor automatic statistics collection progress and detect errors. It has also provided an overview of the automatic statistics collection processing, to help you understand how and when automatic statistics collection determines a table to collect statistics.


Acknowledgment

We would like to acknowledge and thank Joachim Pfefferle for reviewing this article.

Resources

Learn

Get products and technologies

  • Download a free trial version of DB2 Enterprise 9.
  • Now you can use DB2 for free. Download DB2 Express-C, a no-charge version of DB2 Express Edition for the community that offers the same core data features as DB2 Express Edtion and provides a solid base to build and deploy applications.
  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=229103
ArticleTitle=Automatic table maintenance in DB2, Part 1: Automatic statistics collection in DB2 for Linux, UNIX, and Windows
publish-date=06072007