Working as a DB2 UDB Support Analyst, I frequently hear the following question from customers who are dealing with an optimizer or query planning problem:
"How do I recreate the same query access plan on my test environment that I am getting on my production environment?"
Many times, there is a need to replicate a production environment to a test environment, including recreating the same access plan for query analysis purposes.
For example, in production, you could experience performance problems caused by a query that is using a poor access plan and want to replicate the access plan on a test system in order try some different strategies, such as manipulating the statistics, changing the optimization level, trying different settings for DB2 registry variables, and so on, in order to improve performance.
In an ideal world, you would want to have the test environment as closely matched to production as possible. That is, you would want to use exactly the same hardware, operating system maintenance level and configuration, DB2 level, and configuration in both environments, as well as using the same data in test as in production. However, this ideal cannot always be achieved. If the production environment has a very large amount of data, you may simply not have the capacity to keep a test copy of the production system.
The db2look utility can be used to achieve your
goal, even though you can't replicate all the details of production.
This article will explain how you can mimic a production system on a test system without the need for actual data in order to recreate a query planning problem. This ability will help you to debug queries and understand access plan issues without interrupting work in the production environment. Note, however, that if you want to test the execution of the resulting access plan, you will still need to load data from production onto test (as much data as possible). And there is always the possibility that the differences between the test and production systems are still enough that the execution characteristics on test do not match those on production. This part of analysis (performance tuning) is as much an art as science.
Other problems in the optimizer or query compiler area, such as SQL0901N errors or instance crashes, can also be recreated using the methods explained in this article. You can try various strategies, such as testing the most recent fix pack (if the system is at an older fix level), different optimization levels, different registry variables, and so on, in order to see if these changes will correct the problem.
Let's look at the options to use with db2look in
order to achieve this goal.
The
db2look command and its options
Here are the commands you use to capture the needed information from your production system:
Listing 1. Commands to recreate an optimizer problem
db2look -d <dbname> -l -o storage.out db2look -d <dbname> -f -fd -o config.out db2look -d <dbname> -e -a -m -o db2look.out db2look -d <dbname> -e -a -m -t table1 table2 .... tableX -o table.ddl |
Now let's look at these db2look command options
in more detail.
Generate buffer pool, tablespace, and database partition group information
db2look -d <dbname> -l -o storage.out |
Here is a description of the options used in the
db2look command above:
-d: Database name -- This must be specified.-l: Generates database layout. This is the layout for database partition groups, buffer pools and tablespaces.-o: Redirects the output to the given file name. If the-ooption is not specified, then output goes to standard output (stdout), generally the screen.
The -l option is important to mimicking your
production environment. Ideally, you want to have the same buffer pools,
database partition groups (if you're in a multi-partition environment),
and tablespace information (including temporary tablespaces). However, if
you are constrained by memory and cannot allocate the large buffer pools
that you have in production, then use the
db2fopt command. I'll discuss this command in
more detail later in this section.
It is not always possible to have the same tablespaces set up in test that you have in production. For example, you may have devices set up with large sizes, and you may not have the flexibility to create the same device sizes in test. Or, you may not have a separate tablespace device available at all in the test environment. In addition, you might not have the same paths set up in test that you have in production. You would need to alter the paths, devices, and files appropriately to fit your test environment.
The important information used by the optimizer for a tablespace is the following. This is what you would want to make sure are the same on both test and production. (Note: The numbers shown here are an example. You should use the same settings on test as you do on production.)
PREFETCHSIZE 16 EXTENTSIZE 16 OVERHEAD 12.670000 TRANSFERRATE 0.180000 |
If a tablespace is "managed by database" on production, it should also be "managed by database" on test. If it is "managed by system" on production, it should also be that way on test.
Note: If this is a system with multiple physical partitions (MPP), the number of partitions in the database partition group must be the same on test. However, the number of physical machines does not have to be the same. The number of logical partitions in the whole MPP environment must be the same on both test and production.
Generate configuration parameters and registry variables
db2look -d <dbname> -f -fd -o config.out |
Here, I've used the following parameters:
-f: Extracts configuration parameters and registry variables. If this option is specified,-wrapperand-serveroptions will be ignored.-fd: Generates db2fopt statements foropt_buffpageandopt_sortheap, along with other configuration and registry settings.
The output of the command looks like this:
Listing 2. Sample output from db2look command
$ db2look -d sample -f -fd -- No userid was specified, db2look tries to use Environment variable USER -- USER is: SKAPOOR -- This CLP file was created using DB2LOOK Version 8.2 -- Timestamp: Sat Mar 26 00:13:36 EST 2005 -- Database Name: SAMPLE -- Database Manager Version: DB2/6000 Version 8.2.2 -- Database Codepage: 819 -- Database Collating Sequence is: UNIQUE CONNECT TO SAMPLE; -------------------------------------------------------- -- Database and Database Manager configuration parameters -------------------------------------------------------- UPDATE DBM CFG USING cpuspeed 6.523521e-07; UPDATE DBM CFG USING intra_parallel NO; UPDATE DBM CFG USING federated NO; UPDATE DBM CFG USING fed_noauth NO; !db2fopt SAMPLE update opt_buffpage 50000; !db2fopt SAMPLE update opt_sortheap 10000; UPDATE DB CFG FOR SAMPLE USING locklist 1000; UPDATE DB CFG FOR SAMPLE USING dft_degree 1; UPDATE DB CFG FOR SAMPLE USING maxlocks 10; UPDATE DB CFG FOR SAMPLE USING avg_appls 1; UPDATE DB CFG FOR SAMPLE USING stmtheap 2048; UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5; --------------------------------- -- Environment Variables settings --------------------------------- !db2set DB2_ANTIJOIN=yes; !db2set DB2_INLIST_TO_NLJN=yes; COMMIT WORK; CONNECT RESET; TERMINATE; |
The -f and -fd
options are key options to use in order to extract configuration parameter
and environment variables, which the optimizer uses during access plan
phase. In Listing 2, above, note the following
output that resulted from the -fd option:
!db2fopt SAMPLE update opt_buffpage 50000; !db2fopt SAMPLE update opt_sortheap 10000; |
The db2fopt command tells the optimizer to use the
specified value for "Buffer pool size," rather than adding up the pages of
the buffer pools available. (Buffer pool size in the
db2exfmt output is discussed further in the
buffer pool size section below.). For example,
say that you cannot afford to have large buffer pools due to memory
constraints on the test system and would like to configure the size the
same without actually having them in reality. Use the
-fd option, which would generate the
db2fopt commands required to tell the optimizer
to use the specified size rather than calculating based on the buffer
pools available for this database.
It works the same way for sort heap, as we'll see in the sort heap section below.
Here is the usage for the db2fopt command.
Note, that the -fd option with db2look picks
the values for you but you should be aware of the usage and as well aware
of how to reset the values so that we go back to using syscat.bufferpools
for bufferpool pages and sortheap in the database configuration.
C:\>db2fopt Usage: db2fopt <database-alias> update [opt_buffpage <value>] [opt_sortheap <value>f] or db2fopt <database-alias> get [opt_buffpage] [opt_sortheap]] |
If you would like to set the values for opt_buffpage and opt_sortheap, issue:
db2fopt <dbname> update opt_buffpage <value> opt_sortheap <value> |
For example:
C:\>db2fopt sample update opt_buffpage 50000 opt_sortheap 10000 Update succeeded |
Make sure to terminate and reconnect to the database.
If you would like to view the values, issue:
C:\>db2fopt sample get opt_buffpage opt_sortheap
opt_buffpage value is 50000
opt_sortheap value is 10000
And if you would like to reset the values so that we do not use these two parameters, and go back to using syscat.bufferpools for bufferpool pages estimate, and database configuration for sortheap size, issue:
C:\>db2fopt sample update opt_buffpage -1 opt_sortheap -1 Update succeeded |
Make sure to terminate and reconnect to the database.
To make sure that they are reset, use the get option in
db2fopt again:
C:\>db2fopt sample get opt_buffpage opt_sortheap opt_buffpage value is -1 opt_sortheap value is -1 |
If you're a DBA, you will probably be using the DB2 SQL Explain Tool
(db2exfmt) to gain an understanding of your SQL
access plan. The db2exfmt tool is used to
format the contents of the explain tables. If you look at the output of
one of the access plans using db2exfmt from
production, you will notice the following at the top of the plan.
(Note: For the most part, these parameters are picked up by the
-f and -fd option in
the db2look output, with the exception of the
dbheap setting).
Listing 3. Sample output from db2exfmt
Database Context:
----------------
Parallelism: None
CPU Speed: 6.523521e-07
Comm Speed: 100
Buffer Pool size: 50000
Sort Heap size: 10000
Database Heap size: 5120
Lock List size: 1000
Maximum Lock List: 10
Average Applications: 1
Locks Available: 7849
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability
---------------- STATEMENT 1 SECTION 201 ----------------
QUERYNO: 1
QUERYTAG: CLP
Statement Type: Select
Updatable: No
Deletable: No
Query Degree: 1
|
If you go a bit further down in the db2exfmt
output, right after the access plan, you will see if you have any registry
settings that affect the optimizer plan.
Note: Again, unfortunately not all the relevant registry variables
are listed by db2look -f. You will need to add
the ones that are missing. In general, your registry variable settings on
the test system should be identical, or as close as possible, to the
settings on production.
Listing 4. Registry settings that affect access plan
1) RETURN: (Return Result)
Cumulative Total Cost: 57.6764
Cumulative CPU Cost: 191909
Cumulative I/O Cost: 2
Cumulative Re-Total Cost: 5.37264
Cumulative Re-CPU Cost: 134316
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 26.9726
Estimated Buffer pool Buffers: 2
Arguments:
---------
BLDLEVEL: (Build level)
DB2 v8.1.0.80 : s041221
ENVVAR : (Environment Variable)
DB2_ANTIJOIN=yes
DB2_INLIST_TO_NLJN = yes
STMTHEAP: (Statement heap size)
2048
|
Create data definition language (DDL)
The following db2look command creates the DDL to
duplicate all database objects, along with the configuration and
statistical information.
db2look -d <dbname> -e -a -m -o db2look.out |
Here we've used the following parameters:
-a: Generate statistics for all creators. If this option is specified, then the-uoption will be ignored.-e: Extract DDL file needed to duplicate database. This option generates a script containing DDL statements. The script can be run against another database to recreate database objects.-m: Run thedb2lookutility in mimic mode. This option generates a script containing SQL UPDATE statements. TheseSQL UPDATEstatements capture all the statistics. This script can be run against another database to replicate the original one. When the-moption is specified, the-p,-g, and-soptions are ignored .
Gather statistics and DDL for a database subset
To gather statistics and ddl for only certain tables and related objects, use the following command:
db2look -d <dbname> -e -a -m -t <table1> <table2> .. <tableX> -o table.ddl |
Here, I've used the following additional parameter:
-t: Generate statistics for the specified tables. The maximum number of tables that can be specified is 30.
In addition, if you do not use the -a option,
you could use the -z option:
-z: Schema name. If-zand-aare both specified, then-zwill be ignored. Schema name is ignored for the federated section.
Database manager level configuration parameters
Note: Use the command db2 "get dbm cfg",
in order to look at these parameters and
db2 "update dbm cfg using <parameter> <value>"
to update a database manager configuration parameter.
Parallelism:
This parameter indicates whether inter- or intra-partition
parallelism is enabled. If this is DPF with multiple partitions, then you
will see Inter Partition Parallelism. If this is just SMP
(intra_parallel enabled) single node environment, then you will see
Intra Partition Parallelism. If both intra_parallel enabled and
multiple partition environment, you will see both Inter and Intra
partitions parallelism for this parameter. And finally, if there
is no inter- or intra-parallelism, this parameter will show
NONE.
CPU Speed (cpuspeed):
The CPU speed (in milliseconds per instruction) is used by the
SQL optimizer to estimate the cost of performing certain operations.
Communications speed: (comm_bandwidth)
The value specified for the communications bandwidth (in
megabytes per second) is used by the SQL optimizer to estimate the cost of
performing certain operations between partition servers of a partitioned
database system.
Database level configuration parameters
Note: Use the command
db2 "get db cfg for <dbname>" in
order to look at these parameters and
db2 "update db cfg for <dbname> using <parameter> <value>")
to update a database configuration parameter.
Buffer pool size:
The buffer pool size shown in db2exfmt output is determined by
the buffpage parameter, if using buffpage as default for one buffer pool,
or a calculation based on the contents of syscat.bufferpools. The number
shown is the total number of buffer pool pages that are allocated for the
database. For example, let's say we have the following buffer pools:
Table 1. Buffer pool setup
| BUFFERPOOLNAME | SIZE |
|---|---|
| IBMDEFAULTBP | 1000 |
| BP1 | 1000 |
| BP2 | 4000 |
| BPIND1 | 1000 |
| BPIND2 | 1000 |
| BPLONG | 1000 |
| BPTEMP | 1000 |
| Total: | 10,000 |
The db2exfmt output would show the total size as the sum of the number of pages in all the bufferpools. In our example above, it is 10,000. Note: Pagesize does not matter, just the number of pages.
You could use the -fd option in
db2look to use the
db2fopt alternative if you cannot afford to
have the same amount of buffer pool allocated in test as in
production.
In MPP, the opt_buffpage is calculated per node,
as the optimizer uses the total buffer pool information for the node on
which the query is running. Hence, this change will only apply to the node
on which this tool is being run.
Sort heap size (SORTHEAP)
This parameter defines the maximum number of private memory
pages to be used for private sorts or the maximum number of shared memory
pages to be used for shared sorts.
You should set this to the same value as in production. Again, using the
-fd option in
db2look, you will notice the following:
!db2fopt SAMPLE update opt_sortheap 256; |
This will override the sortheap configuration parameter, and this is what
the optimizer will use as the sortheap value. Again, in reality, the
actual sortheap allocated at runtime will be determined by the sortheap
setting in the database configuration. As with
opt_buffpage, you can use
opt_sortheap if you cannot afford to allocate
the same size of sortheap on your test system as you can on production.
Database heap size: (DBHEAP)
There is one database heap per database, and the database
manager uses it on behalf of all applications connected to the database.
It contains control block information for tables, indexes, table spaces,
and buffer pools.
Lock list size: (LOCKLIST)
This parameter indicates the amount of storage that is allocated
to the lock list.
Maximum lock list: (MAXLOCKS)
This parameter defines a percentage of the lock list held by an
application that must be filled before the database manager performs
escalation.
The locklist and maxlocks would help determine the type of locks that will be held during a certain scan (index scan or table scan), along with the isolation level. For example, in the plan you will notice, say, an index scan operation:
IXSCAN: (Index Scan)
TABLOCK : (Table Lock intent)
INTENT SHARE
|
Note: Do not be concerned if Locks Available differs in your
db2exfmt output from your test system, compared
to production -- the difference has no effect on query planning.
Average applications: (AVG_APPLS)
This parameter is used by the SQL optimizer to help estimate how
much buffer pool will be available at run-time for the access plan chosen
(since the buffer pool is shared by all active applications connected to
the database).
Optimization Level: (DFT_QUERYOPT)
The query optimization class is used to direct the optimizer to
use different degrees of optimization when compiling SQL queries
Query Degree: (DFT_DEGREE)
The degree of intra-partition parallelism for an SQL statement.
if set to ANY, the optimizer is sensitive to
the actual number of cpus that are online. if you use
ANY, then the number of cpus on test and
production should be configured the same, unless intra_parallel is
disabled.
In addition to the above changes, there are some others that you must make sure are the same.
Number of frequent values retained: (NUM_FREQVALUES)
This parameter allows you to specify the number of "most
frequent values" that will be collected when the
WITH DISTRIBUTION option is specified on the
RUNSTATS command.
Number of quantiles retained: (NUM_QUANTILES)
This parameter controls the number of quantiles that will be
collected when the WITH DISTRIBUTION option is
specified on the RUNSTATS command.
The above two, NUM_FREQVALUES and
NUM_QUANTILES must be the same on the test
system as in production in order to make sure you collect the same number
of frequent and quantile values on test system as in production.
SQL statement heap (4KB): (STMTHEAP)
The statement heap is used as a workspace for the SQL compiler
during compilation of an SQL statement. This parameter specifies the size
of this workspace. If this parameter is smaller on test than that in
production, you may start seeing SQL0101N message due to lack of space in
statement heap needed to compile the query. As well, you may see SQL0437W
RC=1 , dropping down to greedy join enumeration if there is not enough
statement heap for dynamic join enumeration.
Examples of recreating the optimizer/query plan issue on test system
OS: Windows 2000
DB2LEVEL: V8.2 Fixpack 8 ESE Single partition
TEST and PRODUCTION same OS and db2level.
Databases:
Production Database: SAMPLE
Test Database: DUMMYDB
Sample database created using:
db2sampl
Dummy database created
using:
db2 create db DUMMYDB
PRODUCTION ENVIRONMENT:
-------------------------------------------------------- -- Database SAMPLE and Database Manager configuration parameters -------------------------------------------------------- UPDATE DBM CFG USING cpuspeed 9.446886e-007; UPDATE DBM CFG USING intra_parallel NO; UPDATE DBM CFG USING federated NO; UPDATE DBM CFG USING fed_noauth NO; !db2fopt SAMPLE update opt_buffpage 250; !db2fopt SAMPLE update opt_sortheap 256; UPDATE DB CFG FOR SAMPLE USING locklist 50; UPDATE DB CFG FOR SAMPLE USING dft_degree 1; UPDATE DB CFG FOR SAMPLE USING maxlocks 22; UPDATE DB CFG FOR SAMPLE USING avg_appls 1; UPDATE DB CFG FOR SAMPLE USING stmtheap 2048; UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5; --------------------------------- -- Environment Variables settings --------------------------------- !db2set DB2_INLIST_TO_NLJN=yes; !db2set DB2_HASH_JOIN=yes; |
In addtion to the above, the following configuration should also be noted in the database configuration:
db2 get db cfg for sample > dbcfg_sample.out Database heap (4KB) (DBHEAP) = 600 SQL statement heap (4KB) (STMTHEAP) = 2048 Number of frequent values retained (NUM_FREQVALUES) = 10 Number of quantiles retained (NUM_QUANTILES) = 20 |
Make sure that you stop and start the instance after making the database
manager configuration (dbm cfg) changes. For the sample database, run
runstats as follows for the ORG and the SALES
table:
db2 connect to sample db2 runstats on table <schema>.org with distribution and indexes all db2 runstats on table <schema>.sales with distribution and indexes all db2 terminate |
Now, generate the EXPLAIN tables, by executing the EXPLAIN.DDL file, which is under the <install directory>\sqllib\misc directory:
db2 connect to sample db2 -tvf <intall path>\EXPLAIN.DDL db2 terminate |
Save the following commands in a file called query.sql:
connect to sample set current explain mode explain select * from org a, staff b where a.deptnumb=b.dept and b.dept=15 set current explain mode no terminate |
Now, execute the file as follows:
db2 -tvf query.sql |
The above will only compile the query in explain mode. You should see the following on the screen:
C:\>db2 -tvf query.sql connect to sample Database Connection Information Database server = DB2/NT 8.2.1 SQL authorization ID = SKAPOOR Local database alias = SAMPLE set current explain mode explain DB20000I The SQL command completed successfully. select * from org a, staff b where a.deptnumb=b.dept and b.dept=15 SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 set current explain mode no DB20000I The SQL command completed successfully. C:\>db2 terminate DB20000I The TERMINATE command completed successfully. |
Generate the access plan using db2exfmt as
follows:
db2exfmt -d SAMPLE -g TIC -w -1 -n % -s % -# 0 -o prod_sample_exfmt.txt |
Examine the contents of prod_sample_exfmt.txt file. You should see the following access plan generated:
Access Plan:
-----------
Total Cost: 25.8823
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
4
HSJOIN
( 2)
25.8823
2
/-----+-----\
4 1
TBSCAN TBSCAN
( 3) ( 4)
12.9682 12.913
1 1
| |
35 8
TABLE: SKAPOOR TABLE: SKAPOOR
STAFF ORG
|
Now, this is the plan you are getting on your production sample database. You would like to mimic this plan on your test environment.
Gather the following information from the production sample database:
db2look -d SAMPLE -l -o storage.out db2look -d SAMPLE -f -fd -o config.out db2look -d SAMPLE -e -a -m -t ORG SALES -o table.ddl |
TEST ENVIRONMENT:
Change the database name in each of the files
gathered above where we connect to the database from SAMPLE to
DUMMYDB.
For example, if you view the contents of the 3 files, you will notice:
CONNECT TO SAMPLE; |
Change this to:
CONNECT TO DUMMYDB; |
Take the files over to the test environment. In this example, all the tables are created under the default tablespace, USERSPACE1. So they should also be created in the same SMS tablespace on the test system with same configuration as dumped out in storage.out, including PREFETCHSIZE, EXTENTSIZE, and so on.
Make one little change in the config.out file. Change the
following:
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5; |
to,
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 3; |
and now save the config.out file.
Now, execute storage.out, config.out, and table.ddl as follows:
db2 -tvf storage.out > storage_output.out db2 -tvf config.out > config_output.out db2 -tvf table.ddl > table.out |
Examine the output files to make sure that all the commands ran
successfully. Also, make changes to DBHEAP,
STMTHEAP,
NUM_FREQVALUES,
NUM_QUANTILES for DUMMYDB as it is for the
SAMPLE DB shown in the production environment setup. Also, check that your
registry variable settings are as identical as possible.
Stop and start the instance, using db2stop and
db2start. Create the explain tables again for
this DUMMYDB database:
db2 connect to dummydb; <install path>\sqllib\misc\db2 -tvf EXPLAIN.DDL db2 terminate; |
Now, run the query against the DUMMYDB database, modifying the database name from SAMPLE to DUMMYDB in the query.sql file you generated earlier when running against the SAMPLE database.
C:\>db2 -tvf query.sql connect to dummydb Database Connection Information Database server = DB2/NT 8.2.1 SQL authorization ID = SKAPOOR Local database alias = DUMMYDB set current explain mode explain DB20000I The SQL command completed successfully. select * from org a, staff b where a.deptnumb=b.dept and b.dept=15 SQL0217W The statement was not executed as only Explain information requests are being processed. SQLSTATE=01604 set current explain mode no DB20000I The SQL command completed successfully. C:\>db2 terminate DB20000I The TERMINATE command completed successfully. |
Generate the db2exfmt output:
db2exfmt -d DUMMYDB -g TIC -w -1 -n % -s % -# 0 -o test_dummydb_exfmt.txt |
Examine the contents of test_dummydb_exfmt.txt and look at the access plan:
Access Plan:
-----------
Total Cost: 25.8843
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
4
MSJOIN
( 2)
25.8843
2
/-----+-----\
1 4
TBSCAN TBSCAN
( 3) ( 5)
12.913 12.9682
1 1
| |
8 35
TABLE: SKAPOOR TABLE: SKAPOOR
ORG STAFF
|
You get a different access plan in test than production. In this case, it is obvious, as we changed the DFT_QUERYOPT (default query optimization) on test system from 5 to 3. So, you see a Merge Join plan rather then the Hash Join plan, and the Total Cost differs a tiny bit, as well.
Since the plans don't match (let's say you are not sure why), check the configuration in the db2exfmt output. See Table 2.
As you can see, between the TEST and PRODUCTION, the only difference is the Optimization Level, which we intentionally changed from 5 to 3 just to show why one would not be successful in replicating the production access plan on a testing environment.
In this case, you should update the
DFT_QUERYOPT, using the following
UPDATE statement, to 5:
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5 |
Then terminate and reconnect to the database. Issue query.sql again against
the DUMMYDB, and generate the access plan using the
db2exfmt command. This time you should see the
same access plan. If not, further make sure that all the optimizer-related
parameters discussed in this article are identical.
This example shows the importance of the -m option in the db2look command. The statistics are gathered with the -m option as mentioned earlier, and should be the same in both test and production. In this example, we'll see how the plan changes when the statistics are not updated properly.
The database manager configuration, database configuration and db2set registry variables are the same as in Example 1 above. The schema name here is SKAPOOR. Replace it with the schema for your table. The databases are the same, SAMPLE and DUMMY as in Example 1. The platform and db2level used here were AIX 5.1 with DB2 UDB ESE V8.2, Fix pack 8, single partition.
Perform the following commands on the sample database:
db2 "connect to sample" db2 "create index name_ind on staff (name,id)" db2 "runstats on table skapoor.staff with distribution and indexes all" db2 "set current explain mode explain" db2 "select name from staff where id=10 order by name" db2 "set current explain mode no" db2 "terminate" |
Generate the access plan using db2exfmt. You will see the following access plan:
Access Plan:
-----------
Total Cost: 0.111065
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
IXSCAN
( 2)
0.111065
0
|
35
INDEX: SKAPOOR
NAME_IND
|
Gather the db2look information from the sample database:
db2look -d sample -l -o storage.out db2look -d sample -e -a -m -t STAFF -o db2look.out db2look -d sample -f -fd -o config.out |
Modify the files so that you are connecting to the dummy database instead of sample as we did previously in Example 1 above.
Modify one of the statistics manually. Search for the following statement in the db2look.out file (Note that the schema name, TABSCHEMA and INDSCHEMA will differ in your case):
UPDATE SYSSTAT.INDEXES
SET NLEAF=1,
NLEVELS=1,
FIRSTKEYCARD=35,
FIRST2KEYCARD=35,
FIRST3KEYCARD=-1,
FIRST4KEYCARD=-1,
FULLKEYCARD=35,
CLUSTERFACTOR=-1.000000,
CLUSTERRATIO=100,
SEQUENTIAL_PAGES=0,
DENSITY=0,
AVERAGE_SEQUENCE_GAP=0.000000,
AVERAGE_SEQUENCE_FETCH_GAP=0.000000,
AVERAGE_SEQUENCE_PAGES=0.000000,
AVERAGE_SEQUENCE_FETCH_PAGES=0.000000,
AVERAGE_RANDOM_PAGES=1.000000,
AVERAGE_RANDOM_FETCH_PAGES=0.000000,
NUMRIDS=35,
NUMRIDS_DELETED=0,
NUM_EMPTY_LEAFS=0
WHERE INDNAME = 'NAME_IND' AND INDSCHEMA = 'SKAPOOR '
AND TABNAME = 'STAFF' AND TABSCHEMA = 'SKAPOOR ';
|
Now, change the FIRSTKEYCARD, FIRST2KEYCARD, FULLKEYCARD AND NUMRIDS from 35 to 37. Save the db2look.out file and now run the 3 files:
db2 -tvf config.out > config_output.out db2 -tvf storage.out > storage_output.out db2 terminate db2stop db2start db2 -tvf db2look.out > db2look_output.out |
Examine the contents of the first two files, config_output.out and storage_output.out to make sure they ran successfully. Now, examine the contents of db2look_output.out file. You will see that the following update statement below failed:
UPDATE SYSSTAT.INDEXES SET NLEAF=1, NLEVELS=1, FIRSTKEYCARD=37, FIRST2KEYCARD=37 , FIRST3KEYCARD=-1, FIRST4KEYCARD=-1, FULLKEYCARD=37, CLUSTERFACTOR=-1.000000, C LUSTERRATIO=100, SEQUENTIAL_PAGES=0, DENSITY=0, AVERAGE_SEQUENCE_GAP=0.000000, A VERAGE_SEQUENCE_FETCH_GAP=0.000000, AVERAGE_SEQUENCE_PAGES=0.000000, AVERAGE_SEQ UENCE_FETCH_PAGES=0.000000, AVERAGE_RANDOM_PAGES=1.000000, AVERAGE_RANDOM_FETCH_ PAGES=0.000000, NUMRIDS=37, NUMRIDS_DELETED=0, NUM_EMPTY_LEAFS=0 WHERE INDNAME = 'NAME_IND' AND INDSCHEMA = 'SKAPOOR ' AND TABNAME = 'STAFF' AND TABSCHEMA = 'SK APOOR ' DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL1227N The catalog statistic "37" for column "FULLKEYCARD" is out of range for its target column, has an invalid format, or is inconsistent in relation to some other statistic. Reason Code = "8". SQLSTATE=23521 |
As you can see, the above UPDATE statement for the index, NAME_IND, failed because the FULLKEYCARD is greater then the cardinality (CARD) of the table. The CARD is 35 as can be seen by the following update statement in the db2look.out file:
UPDATE SYSSTAT.TABLES
SET CARD=35,
NPAGES=1,
FPAGES=1,
OVERFLOW=0,
ACTIVE_BLOCKS=0
WHERE TABNAME = 'STAFF' AND TABSCHEMA = 'SKAPOOR ';
|
Now again, run the same query:
db2 "select name from staff where id=10 order by name" |
in explain mode and generate the access plan. You will see it is different:
Access Plan:
-----------
Total Cost: 12.972
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
TBSCAN
( 2)
12.972
1
|
1
SORT
( 3)
12.9708
1
|
1
TBSCAN
( 4)
12.9682
1
|
35
TABLE: SKAPOOR
STAFF
|
This example shows that if RUNSTATS is run while there is WRITE activity happening on the table, the statistics can be inconsistent as in this example. Hence, the UPDATE statements to update the statistics can fail with the SQL1227N error message. It is very important that all the UPDATE statements run successfully, and if there are any inconsistencies, they should be fixed and re-run. In this case, solution would be to change back the KEYCARDS and NUMRIDS from 37 to 35.
You want to mimic the whole database in production to test, in a single partition environment.
Note: You may need to change the database name in each output of
db2look, if the database names are
different on test compared to production.
- Step1: Collect the
db2lookwith the-loption to gather the tablespace/bufferpoo/database node group information.db2look -d <dbname> -l -o storage.out
Modify the tablespace information to fit your test environment. Say, for example, in production you have the following tablespace:------------------------------------ -- DDL Statements for TABLESPACES -- ------------------------------------ CREATE REGULAR TABLESPACE DMS1 IN DATABASE PARTITION GROUP IBMDEFAULTGROUP PAGESIZE 4096 MANAGED BY DATABASE USING ( FILE '/data/dms1'20000, FILE '/data/dms2'20000, FILE '/data/dms3'20000) EXTENTSIZE 32 PREFETCHSIZE 32 BUFFERPOOL IBMDEFAULTBP OVERHEAD 12.670000 TRANSFERRATE 0.180000 DROPPED TABLE RECOVERY ON;If you do not have the same paths set up on test, then change the location above. If you are only planning to mimic the environment, not copy the data over, then make the file sizes smaller, and use fewer containers if needed. If you do not have the same buffer pools created, then you can also change the buffer pool name. Buffer pool must have same pagesize. Don't change tablespace pagesize Once you are done with this and have created the databsae, run the storage.out file:
db2 -tvf storage.out
Redirect the output if needed to make sure it all went through successfully. For example:
db2 -tvf storage.out > storage_results.out
- Step 2: Collect the configuration and environment variable
information from production and run it on
test:
db2look -d sample -f -fd -o config.out
Remember, in MPP environments, this information will be collected for the node where you ran this. If your DB2 registry and database and database manager configuration are different on different database partitions, you would want to gather the information from each node separately. However, if you cannot have the same partitions in test as in production, then use this information from the node where you execute the query in production and then use this information on test.
Note that if you have different number of partitions on test, your simulation will be imperfect.
On test, run the config.out file as follows:
db2 -tvf config.out
The above is taking under consideration that the optimizer will use the
db2foptinformation to look at the total buffer pools allocated and sortheap, which will be the setting now in your test environment. Again, this is a technique used when you don't have the same buffer pools due to memory constraints and sortheap in test as in production Also, other configuration parameters along with environment variables discussed earlier in this article will also be updated. - Step 3: Collect the DDL information for all objects from
production, as we are mimicking the whole database, and run the
db2lookin test.On production:
db2look -d sample -e -a -m -o db2look.out
On test:
db2 -tvf db2look.out
To see the output result, issue:
db2look -tvf db2look.out > db2look.results
Once the above steps are complete, ensure that the dbheap database configuration parameter is set to the same value in test as in production.
- Step 4: Get the access plan using
db2exfmtfrom both test and production, and make sure that the following is the same as it is in production:Database Context: ---------------- Parallelism: None CPU Speed: 4.762804e-07 Comm Speed: 100 Buffer Pool size: 128500 Sort Heap size: 128 Database Heap size: 5120 Lock List size: 12250 Maximum Lock List: 10 Average Applications: 4 Locks Available: 78400 Package Context: --------------- SQL Type: Dynamic Optimization Level: 3 Blocking: Block All Cursors Isolation Level: Cursor Stability ---------------- STATEMENT 1 SECTION 201 ---------------- QUERYNO: 1 QUERYTAG: CLP Statement Type: Select Updatable: No Deletable: No Query Degree: 1Now, look at the access plan. If they are the same, then you are successful in recreating the access plan. Note also that you should also look towards the bottom of the db2exfmt output to verify that the tablespace configuration match.
Production: MPP with 16 physical / 4 logical per partition.
Test: MPP
with only 4 physical machines available with 4 logical per
partition.
Tables, views/MQTs involved in the query.
In this example, the simulation may not work accurately. You must have the name number of partitions in both test and production. However, you can still try to recreate, but it will not be right.
Therefore, you must add 16*4=64 partitions to the test environment in order to get a proper recreation. You do not need 16 physical machines in the test environment; that is, you can have four physical machines with 16 logical partitions per physical. It's up to you, but you must have 64 logical partitions overall, the same as production.
So now, after making the changes to add the same number of logical partitions to test environment, the test should look like the original production setup, shown in the table below.
Table 3. Production setup
| DBPARTITIONS |
| ALLNODES (on nodes 1 to 64) |
| NODE1 (db partition defined on node 1) |
| NODE2 (db partition defined on node 5) |
| TABLESPACES |
| TABSPACE1 (DMS using devices defined in dbpartition ALLNODES) |
| TABSPACE2 (DMS using SMS defined in dbpartition NODE1) |
| TABSPACE3 (DMS using DMS defined in dbpartition NODE2) |
| TABLES |
| TAB1 in TABSPACE1 |
| TAB2 in TABSPACE2 |
| TAB3 in TABSPACE3 |
| MQT: |
| MQT defined on TAB3 |
| VIEW: |
| VIEW1 defined that include two tables TAB1 and TAB2 |
Make sure you gather the db2look using the
-f and -fd from the
node from which the query is being issued to make sure that the buffer
pool info, as discussed earlier, is picked from this node and the registry
setting, along with db cfg and dbm cfg, is picked up from the node where
the query is being ran. In my experience, customers usually have same
configuration on all the nodes, except for buffer pools, which is a very
important setting.
Steps to follow:
- Step1: Gather the storage information from
production:
db2look -d <dbname> -l -o storage.out
- Step2: Modify the tablespace/buffer pool information to fit the
test environment. If you do not have devices available, then use DMS
file containers. Also, if you do not wish to use the same number of
containers in test that are in production, shorten the list and use
fewer containers. But again, you will have to make sure that if a
tablespace is of type DMS or SMS on production, it must remain the
same type in test.
- Step3: Gather the configuration information using the following
command:
db2look -d <dbname> -f -fd -o config.out
- Step4: Now, gather the
db2lookinformation for just the objects that we are interested in. In our example, we want all related information, including table DDL, views, and MQTs associated with the tables:db2look -d <dbname> -e -a -m -t TAB1 TAB2 TAB3 -o db2look.out
Once you have gathered all of this information and modified the tablespace/buffer pool information, execute the
db2lookoutput files on test environment, and, again, get thedb2exfmtoutput from both production and test and compare them.
This is a classic example of an issue where you have collected RUNSTATS information during activity on the table. You're getting SQL1227N error messages, and you will not be able to recreate the issue unless you modify the statistics manually.
Say for example, the table has one million rows and there is a primary key
defined on an integer column. You ran RUNSTATS
with distribution and indexes all option, allowing write access to the
table. By the time we got the table statictics, there were 100,000
additional records inserted into the table. So for the table statistics,
the CARD will show as 1,100,000. However, by the time we started
collecting index statistics, for example, for the primary key defined on
the integer column, there were 10,000 additional records inserted, hence,
the number of rows in the table is 1,110,000, and hence the primary index
FIRSTKEYCARD would be 1,110,000. So you can see the inconsistency. The
CARD for the table statistics says that there should be 1,100,000 records
in the table, whereas the FIRSTKEYCARD for the primary index statistics
says that there are 1,110,000 records in the table. The update for the
index statistics would fail with a SQL1227N rc=8 error message (in this
case) as the FIRSTKEYCARD of the index is greater than CARD of the table.
You will have to manually fix this inconsistency and, for this example,
make the FIRSTKEYCARD equal to CARD, which is 1,100,000, or the opposite
-- that is, increase CARD to equal FIRSTKEYCARD, which is 1,110,000.
There are many other inconsistencies you could encounter. Make sure that
when you run the db2look with the
-m option on test that you save the output to a
file, review all the inconsistencies, and fix them. I have only given an
example of one inconsistency here; there are many others that you can run
into, and it is left to the user to fix all these inconsistencies and then
rerun db2look, redirecting the output to a file
to make sure all the update statistics ran fine without any problems.
This is an example where in production you get a SQL0437W rc=1 warning
message and you are not seeing it test. In this case, follow the above
examples to recreate the issue. Make sure that
STMTHEAP is the same. If it is different (for
example, higher in test for some reason then in production), then you may
not see the same warning. Again, the other parameters that we've already
discussed matter as well.
SQL0437W rc=2 and other return codes can be recreated as well, following the same method.
Other errors messages, such as SQL0101N and SQL0901N, can also be recreated using the same method. Even traps in the compiler/optimizer area can be recreated. This is very useful when you are at an older fix pack level and want to try the latest fix pack level to see if that avoids the problem or when you want to play around with different optimization levels to see if that would overcome the problem temporarily.
The db2look utility is quite a powerful utility
that you can use to recreate access plan problems along with compiler
issues like those we have talked about in this article. Once the problem
is recreated, you can test out many variables that can affect performance,
such as changing optimization levels, playing with registry variables, and
updating the statistics without affecting the production, as well as
testing new fix pack levels. You'll find this a handy utility for
debugging problems and improving query performance.
- Learn more:
- developerWorks DB2 zone: Learn more about DB2. You'll find technical documentation, how-to articles, education, downloads, product information, and more.
- Discuss:
- developerWorks blogs: Get involved in the developerWorks community.







