Part 1 of my column covered IBM DB2 data warehouse performance management from a system and database perspective. In this follow-on, I'll provide recommendations for SQL statement-level performance tuning.
Priority one: Target selection
When it comes to query tuning, it's important for you to do not only good work, but the right work. Your first impulse might be to work on queries with long run times, but these aren't necessarily the "problem" queries. A query that runs for 10 seconds can cause a lot of dissatisfaction if it is executed frequently and is part of a process that users expect to complete in a couple of seconds.
Often, the best indicator of where tuning work is most needed is "the voice of the user." Absent specific complaints, you'll want to spend your time on the queries that stand out in terms of both run time (and/or CPU time) and frequency of execution.
There are several tools on the market that can facilitate query tuning target selection (IBM's offerings include DB2 Query Monitor for z/OS and DB2 Performance Expert for Linux, UNIX, and Windows). However, you can get useful query target selection information from DB2 itself. Mainframers should look at the STMTCACHE ALL option of the EXPLAIN statement (introduced via DB2 for z/OS V9, and delivered for V8 by way of the fix for APAR PQ88073). EXPLAIN STMTCACHE ALL will insert into the DSN_STATEMENT_CACHE_TABLE a row for each SQL statement in the dynamic statement cache. Among the 40-plus columns in the DSN_STATEMENT_CACHE_TABLE are those in which a query's statement text, accumulated elapsed time, accumulated CPU time, and number of executions are recorded. That should help you to zero in on statements that are likely to provide a good return on your investment of performance-tuning time.
Over on the Linux, UNIX, and Windows (LUW) side, DB2 folks should take advantage of the administrative views introduced with DB2 9 (the high-level qualifier for these views is SYSIBMADM; you can read about them in the DB2 9 for LUW System Monitoring Guide and Reference). One of the views that can be particularly useful is SYSIBMADM.TOP_DYNAMIC_SQL, which includes columns named STMT_TEXT, AVERAGE_EXECUTION_TIME_S, and NUM_ EXECUTIONS.
Access path selection: What did DB2 decide, and why?
Statement-level tuning usually involves finding a better way to access the data needed for generating a query result set. Will a table be scanned in its entirety? Will an index be used? If tables are to be joined, what join method should be used? Answering these questions generates the query's access path; your tuning aim is to get a better path that will reduce statement execution time.
But first, you need to understand the path selected by the DB2 optimizer. That information is obtained via the DB2 EXPLAIN statement (EXPLAIN output for a statement can be examined in text form by querying EXPLAIN tables, or in a graphical form using the Optimization Service Center for DB2 for z/OS or the Visual Explain functionality of DB2 for LUW). Once you have the EXPLAIN output, look at how the query's result set is generated, and then consider alternative access path possibilities.
To properly assess these access path alternatives, you'll need information from the DB2 catalog about the tables accessed by the query and the indexes defined on those tables. Keep in mind that this catalog data is what the optimizer uses to make access path decisions. If the catalog statistics are inaccurate, get them up to date with RUNSTATS. That simple step could lead to a new access path that might dramatically improve query performance.
When you've looked at the access path selected by DB2 for your target query, ask yourself this question: Should DB2 have selected a different access path for the query? If a path other than the one selected by DB2 looks better to you, check to see if the tables targeted by the query-and their associated indexes-are well organized. If the catalog statistics are up to date and DB2 knows that a particular index is poorly organized, DB2 may decide not to use it. In that case, reorganizing the index might lead to its selection as part of a better-performing query access path.
A more likely explanation for a difference of opinion between you and DB2 is that you know something that DB2 doesn't. Perhaps you know that data values in a certain column are not evenly distributed. Without column data-value distribution information in the catalog, DB2 will assume an even distribution of values, which may lead the optimizer to a sub-optimal access path.
If a significant data value skew exists for a column referenced in a query predicate, make sure that DB2 knows this, preferably by generating column-value histogram statistics using RUNSTATS. (Histogram statistics, available in DB2 for LUW for quite some time, can be generated with DB2 for z/OS V9 RUNSTATS. If you are using DB2 for z/OS V8, the next best thing is the FREQVAL option of RUNSTATS.) Note that advice for enriching DB2 for z/OS catalog statistics for the benefit of a particular query can be obtained from the Optimization Service Center for DB2 for z/OS-a free version of which can be downloaded from IBM.com.
When you and DB2 agree If you and DB2 agree on the best available access path but that path does not deliver satisfactory performance, you need to provide DB2 with a new access path choice. There are several ways to do this, including adjusting your indexes, using a materialized query table, and table re-clustering.
Index-related actions can reduce the number of pages that DB2 has to examine to generate the query's result set. You might define an index on a predicate-referenced column that is not currently indexed. You might define an index on a set of columns that is already indexed, but with a different column order specified to increase the number of index key columns on which DB2 can perform predicate-related value matches. (For example: given a compound predicate of the form COL_A > y AND COL_B = x, and an existing index on COL_A | COL_B, a new index on COL_B | COL_A would enable DB2 to match on both columns of the index key instead of just matching on the first key column.) You might add a column to an existing index to get index-only access for a portion of the query's processing.
Another way to provide a new and better access path for a query is through the use of DB2's materialized query table (MQT) functionality. An MQT is defined by way of a SELECT statement-usually a result set that would otherwise have to be dynamically built when a query executes. Here are three great things about an MQT:
- DB2 can automatically rewrite a query to take advantage of an MQT.
- Because the result set in the MQT is already there, the run time for a query that is rewritten by DB2 to access the MQT can be reduced dramatically, saving time that would otherwise be spent dynamically building that result set at query execution time.
- An MQT can be indexed to provide an even greater boost to query performance. Table re-clustering is another means of providing DB2 with a better access path by reducing the number of pages that DB2 has to examine in executing the query. Data clustering can have an especially large impact on data warehouse query performance, because rows are often retrieved in large numbers. If a table was originally defined to be clustered on an account number column, and it subsequently turned out that users were retrieving rows from the table on a date-range basis, it might be worth considering a change in the table's clustering sequence to get retrieved rows into close proximity to each other in the table. Also, the performance of join operations may be improved significantly if joined tables are clustered in join-column sequence (e.g., by CUSTOMER_ID when the join predicate is TABLE_A.CUSTOMER_ID = TABLE_B. CUSTOMER_ID).
If you do decide to re-cluster a table, keep in mind that DB2 for z/OS V9 makes this much easier through the new CLUSTER and NOT CLUSTER options of the ALTER INDEX statement. If re-clustering is on the table, make sure that you give some thought to exploiting clustering-related enhancements delivered in the last couple of DB2 versions, such as multidimensional clustering for DB2 for LUW, and the ability to partition a table on one key and cluster within partitions on another key in a DB2 for z/OS environment.
Here's hoping that your data warehouse query-tuning efforts will deliver breakthrough results. Business intelligence is as hot an application area as you can find these days, and getting into it (if you're not already there) will give you all kinds of opportunities to deliver real value to your organization. Tune on!