DB2 maintenance utilities
DB2 utilizes a sophisticated cost-based optimizer to determine how data is being accessed. Its decisions are heavily influenced by statistical information about the size of the database tables and indices. Therefore, it is important to keep the database statistics up to date so an efficient data access plan can be chosen. The RUNSTATS utility is used to update statistics about the physical characteristics of a table and the associated indices. Characteristics include number of records (cardinality), number of pages, average record length, etc.
Let's use some examples to illustrate the usage of this utility. The
following command collects statistics on the table db2user.employee.
Readers and writers are allowed to access the table while the
statistics are being calculated:
RUNSTATS ON TABLE db2user.employee
ALLOW WRITE ACCESS.
The following command collects statistics on the table db2user.employee, as well as on the columns empid and empname with distribution statistics. While the command is running, the table is only available for read-only requests:
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION ON COLUMNS ( empid, empname ) ALLOW READ ACCESS
The following command collects statistics on table db2user.employee and
detailed statistics on all its indices:
RUNSTATS ON TABLE
db2user.employee AND DETAILED INDEXES ALL.
You can be very specific when it comes to collecting statistics on the
database objects. Different combinations of the
RUNSTATS options can be used to collect
table statistics, index statistics, distribution statistics, sampling
information, etc. To simplify statistics collection, you can save the
options you specify when you issue the
RUNSTATS command in a statistics profile.
If you want to collect the same statistics repeatedly on a table and
do not want to retype the command options:
RUNSTATS ON TABLE
db2user.employee USE PROFILE.
This command collects statistics on db2user.employee using the options
recorded in the statistics profile for that table. So, how do you set
a statistics profile? It is as easy as using the
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION ON COLUMNS ( empid, empname ) SET PROFILE ONLY
Notice that the option will only set the profile, the
RUNSTATS command will not run. If you need
to modify a previously registered statistics profile, use the
UPDATE PROFILE ONLY option. Similarly, this
option will only update the profile without running the
RUNSTATS command. If you want to update the
profile as well as the statistics, use
UPDATE PROFILE instead.
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50 UPDATE PROFILE
RUNSTATS is a resource-intensive utility.
However, in order to maintain efficient database operation, statistics
must be collected regularly. You should find regular windows of
reduced database activity so database statistics can be collected
without affecting database performance. In some environments, there is
no such window. Throttling of
be considered to limit the amount of resources consumed by the
utility. When database activity is low, the utility runs more
aggressively. On the other hand, when database activity increases, the
resources allocated to executing
are reduced. Following is how to specify the level of throttling.
RUNSTATS ON TABLE db2user.employee WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 50 NUM_QUANTILES 50 UTIL_IMPACT_PRIORITY 30
The acceptable priority value ranges from 1 to 100, 100 representing the highest priority (meaning un-throttled) and 1 representing the lowest; 50 is the default priority level.
Note that automatic statistics collection is enabled by default when
the database is created. It can be turned off by setting the database
Data being added and removed from the database might not be physically placed in a sequential order. In such a case, DB2 must perform additional read operations to access data. This usually means that more disk I/O operations are required, and we all know such operations are costly. In such a case, you should consider physically reorganizing the table to the index so related data is located close to one other, minimizing I/O operations.
REORG is a utility to reorganize data for a table or index. Although
data is physically rearranged, DB2 provides the option of performing
this online or offline. Offline REORG by default allows other users to
read the table. You may restrict table access by specifying the
ACCESS option. Online REORG (also
called in-place REORG) supports read and write access to the table.
Since data pages are rearranged, concurrent applications might have to
wait for REORG to complete with the current pages. You can easily
stop, pause, or resume the process with the appropriate options.
The following examples are fairly self-explanatory:
REORG TABLE db2user.employee INDEX db2user.idxemp INPLACE ALLOW WRITE ACCESS REORG TABLE db2user.employee INDEX db2user.idxemp INPLACE PAUSE
You can also reorganize an index. If the
CLEANUP clause is used as shown in one of
the examples below, a cleanup will be done instead of a
REORG INDEX db2user.idxemp FOR TABLE db2user.employee ALLOW WRITE ACCESS REORG INDEX db2user.idxemp FOR TABLE db2user.employee CLEANUP ONLY
REORGCHK is another data maintenance utility that has an option to
retrieve current database statistics or update the database
statistics. It will also generate a report on the statistics with
REORG indicators. Using the statistics formulae, REORGCHK marks the
tables or indices with asterisks (
there is a need to REORG.
Let's consider some examples. The following command generates a report
of the current statistics on all tables owned by the runtime
REORGCHK CURRENT STATISTICS ON TABLE
The following command updates the statistics and generates a report on
all the tables created under the schema smith:
STATISTICS ON SCHEMA smith.
And here's some REORGCHK sample output:
Listing 40. REORGCHK sample output
Table statistics: F1: ... < 5 F2: ... > 70 F3: ... > 80 SCHEMA.NAME ... F1 F2 F3 REORG --------------------------------------------- Table: DB2INST1.XMLFILES ... 0 89 98 --- --------------------------------------------- Index statistics: F4: ... > 80 F5: ... > MIN(50, (100 - PCTFREE)) F6: ... < 100 F7: ... < 20 F8: ... < 20 SCHEMA.NAME ... PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG ---------------------------- ... ----------------------------------------- Table: DB2INST1.XMLFILES Index: DB2INST1.IDX1 ... 50 98 82 - 0 0 ----- Index: DB2INST1.IDX2 ... 0 71 - - 0 0 *---- Index: DB2INST1.IDX3 ... 33 98 220 - 0 0 ----- --------------------------------------------------------------------------
Notice that DB2INST1.IDX2 is marked by an asterisk in the REORG column as its formula F4 calculation is 71, which does not meet the evaluation criteria: F4: ... > 80.
The above REORGCHK output indicates that only the DB2INST1.IDX2 index
requires reorganization. So, you will only need to run the
REORG command on that index instead of the
Before a database application program or any SQL statement can be executed, it is precompiled by DB2, and a package is produced. A package is a database object that contains compiled SQL statements used in the application source file. DB2 uses the packages to access data referenced in the SQL statements. So, how does the DB2 optimizer choose the data access plan for these packages? It relies on database statistics at the time the packages are created.
For static SQL statements, packages are created and bound to the
database at compile time. If statistics are updated to reflect the
physical database characteristics, existing packages should also be
updated. The REBIND utility allows you to recreate a package so that
the current database statistics can be used. The command is very
In many cases, SQL statements contain host variables, parameter
markers, and special registers. The values of these variables are not
known until runtime. With the
REBIND command, you can specify
whether to have DB2 optimize an access path using real values for host
variables, parameter markers, and special registers. There are three
NONE— Real values of the host variables, parameter markers, and special registers used in the SQL statement will not be used to optimize an access path. The default estimates for these variable will be used instead.
ONCE— The access path for a given SQL statement will be optimized using the real values of the host variables, parameter markers, or special registers when the query is first executed.
ALWAYS— The access path for a give SQL statement will always be compiled and reoptimized using the values of the host variables, parameter markers, or special registers.
REBIND PACKAGE ACCTPKG REOPT ONCE
However, if you are going to change the application source, the existing associated package needs to be explicitly dropped and recreated. The REBIND utility is not used for this purpose. We bring this to your attention here because DBAs often misunderstand the usage of REBIND.
As for dynamic SQL statements, they are precompiled at runtime and
stored in the package cache. If statistics are updated, you may flush
the cache so that dynamic SQL statements are compiled again to pick up
the updated statistics. The command looks like this:
FLUSH PACKAGE CACHE
Now that you understand
RUNSTATS, REORG, REORGCHK,
FLUSH PACKAGE CACHE, let's review the data
maintenance process that should be performed regularly against your
database. The process is illustrated in the following diagram.
Figure 4. Database maintenance process