DB2 indexes and query performance: Part 2
No new indexes? No problem.
This content is part # of # in the series: Data architect
This content is part of the series:Data architect
Stay tuned for additional content in this series.
In the first installment of this two-part column, I focused on the pros and cons of reducing query times by adding indexes to a DB2 database. We discovered that there is a point where aggregate index costs—more-expensive
DELETE operations, for example—outweigh a new index's query speed-up benefit. But if you don't want to increase the number of indexes defined on the table, are you at a performance tuning dead end? Of course not. You have plenty of options for accelerating queries in a DB2 environment while holding the line on indexes.
Better indexes versus more indexes
Start with an index effectiveness upgrade. Can you replace useless indexes with indexes that enhance query performance? This could be a challenge with earlier versions of DB2 if a lot of dynamic SQL statements were executed, because the
SYSCAT.PACKAGEDEP catalog view (DB2 for Linux, UNIX, and Windows [LUW]) and the
SYSIBM.SYSPACKDEP catalog table (DB2 for z/OS) show index dependencies only for static SQL statements.
However, DB2 9.7 for LUW and DB2 9 for z/OS introduced a handy catalog column called
LASTUSED. In the
SYSCAT.INDEXES catalog view (LUW) and the
SYSIBM.SYSINDEXSPACESTATS catalog table (z/OS),
LASTUSED shows the date on which an index was last used either to speed up a DML statement (static or dynamic) or to enforce a referential integrity constraint. If the
LASTUSED column contains its default value ('0001-01-01' on LUW or
NULL on the mainframe), the associated index is a good candidate for being dropped. Of course you'll want to do some additional checking, just to make sure that the index truly isn't needed.
Another way to boost index effectiveness is to add columns to an existing index, so as to increase the number of columns that can be matched with query predicates or to get index-only access for more queries. But be careful with this tactic; you do not want to replicate a table in an index by including all of the table's columns in the index key.
Consider adding a column to an existing index if the additional column is relatively short and if its inclusion in the index key would provide a major performance boost for one or more high-cost queries. With DB2 for LUW, you'd drop the existing index and re-create it with the additional column; on a DB2 for z/OS system, you can add a column to an index via an
ALTER INDEX ADD COLUMN statement. DB2 10 for z/OS introduces a nifty new way to add a column to an existing index: the
ADD INCLUDE option of
ALTER INDEX. Check out the Resources sidebar to learn more.
No-new-index tuning options
Rewriting long-running queries is another way to boost performance without adding indexes. One technique often used is rewriting to eliminate predicates linked by
OR, which are not indexable (aside from the possibility of DB2 using a multi-index access mechanism, called index
ORing, that involves the union of row ID lists from two different indexes).
UNION or an
are two common ways to eliminate an
OR in a predicate
(see Table 1).
Rewriting to eliminate predicates linked by
Eliminate OR with UNION
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND A.COL1 = 2 OR B.COL2 = 'CAT'
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND A.COL1 = 2
SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND B.COL2 = 'CAT'
Eliminate OR with an IN-list
SELECT * FROM TABLE_A
WHERE (COL1 = 2 AND COL2 = 'CAT') OR (COL1 = 2 AND COL2 = 'DOG')
SELECT * FROM TABLE_A
WHERE COL1 = 2 AND COL2 IN ('CAT',
UNION to replace an
OR can be especially effective if the query involves a join and the two
ORed predicates reference columns in two different tables. In such a case, the
OR prevents the two predicates from being applied until after the respective tables have been joined. Note also that in the
IN-list example, both of the predicates in the rewritten query are stage 1 and indexable, leading to better index key column matching and potentially much-reduced query elapsed time.
Also in the query rewrite vein, modifying a
SELECT to get rid of a range-type predicate can increase performance by improving index key column matching. As we saw in the last issue, the matching of predicates to index key columns stops after a range predicate—such as
BETWEEN—is encountered. In some cases, you can use an
IN-list to replace a range predicate. For example, if
COL1 is a character column of length one, a predicate like
COL1 BETWEEN '2' AND '4' could be rewritten as
COL1 IN ('2', '3', '4').
Here's one more query rewrite idea: if possible, remove column expressions in predicates to make the predicates indexable. Let's say that
:V2 are host variables, with
:V2 containing a date value. Now, consider this predicate:
(START_DATE - :V1 DAYS) < :V2
The arithmetic expression involving the
START_DATE column makes the predicate non-indexable. If you rewrite the predicate to get the arithmetic operation on the right side of the operator ("operator" being =, >, <, etc.), it becomes indexable and you could see a big reduction in query run time. The rewritten example predicate would look like this:
START_DATE < (DATE(:V2) + :V1 DAYS)
No query mods needed
There are also no-new-index-needed tuning actions that don't require query modification. Do you have a high-cost query with a predicate that references a column that has many duplicate values and for which the distribution of those values is very non-uniform (for example, a column of a 1-million-row table that has 1,000 distinct values, one of which appears in 900,000 of the 1 million rows)?
Let DB2 know about that uneven distribution of duplicate values by gathering value-frequency distribution and/or histogram statistics for the column via the
RUNSTATS utility. Absent these statistics, DB2 assumes a uniform spread of duplicate column values across a table's rows, so in the aforementioned example it would be assumed that each of the 1,000 distinct column values appears in 1,000 of the table's 1 million rows. For DB2 for LUW, column value distribution statistics are collected via the
WITH DISTRIBUTION clause of
RUNSTATS. For DB2 for z/OS, value-frequency stats are generated using the
FREQVAL option of
RUNSTATS, and histogram statistics—available on the mainframe platform starting with DB2 9—are obtained via the
HISTOGRAM option (histogram statistics have been available on DB2 for LUW systems for quite some time).
Once the catalog has been populated with value-frequency and/or histogram statistics, rebind the program (or just submit the query again, if it's dynamic SQL), and you may see a big-time performance improvement. By the way, you can get recommendations on enriching catalog statistics to potentially improve the performance of a query by running the query through the Statistics Advisor component of IBM's Data Studio tool.
The right tool for the job
When the task at hand is query performance tuning, of course you should keep the new-index option in mind. Just remember that it's a high-value card that should be played at the right time. And don't forget the other techniques available to you. Indexes aren't the only tool in the query-tuning shed, pardner.
- Matching predicates and index key columns ("DB2 Indexes and Query Performance: Part 1," IBM Data Management, Issue 4, 2010)
- Add columns to existing indexes with the DB2 10
ALTER INDEXstatement in the DB2 10 for z/OS SQL reference