Data architect: DB2 indexes and query performance: Part 2

No new indexes? No problem.

The conclusion of Robert Catterall's two-part series on best practices for reducing query times using DB2.

This article was originally published in IBM Data magazine.

Share:

Robert Catterall (rfcatter@us.ibm.com), IBM DB2 specialist, IBM

Robert Catterall is an IBM DB2 specialist.



14 January 2011

Also available in Chinese

In the first installment of this two-part column, I focused on the pros and cons of reducing query times by adding indexes to a DB2 database. We discovered that there is a point where aggregate index costs—more-expensive INSERT and DELETE operations, for example—outweigh a new index's query speed-up benefit. But if you don't want to increase the number of indexes defined on the table, are you at a performance tuning dead end? Of course not. You have plenty of options for accelerating queries in a DB2 environment while holding the line on indexes.

Better indexes versus more indexes

Start with an index effectiveness upgrade. Can you replace useless indexes with indexes that enhance query performance? This could be a challenge with earlier versions of DB2 if a lot of dynamic SQL statements were executed, because the SYSCAT.PACKAGEDEP catalog view (DB2 for Linux, UNIX, and Windows [LUW]) and the SYSIBM.SYSPACKDEP catalog table (DB2 for z/OS) show index dependencies only for static SQL statements.

However, DB2 9.7 for LUW and DB2 9 for z/OS introduced a handy catalog column called LASTUSED. In the SYSCAT.INDEXES catalog view (LUW) and the SYSIBM.SYSINDEXSPACESTATS catalog table (z/OS), LASTUSED shows the date on which an index was last used either to speed up a DML statement (static or dynamic) or to enforce a referential integrity constraint. If the LASTUSED column contains its default value ('0001-01-01' on LUW or NULL on the mainframe), the associated index is a good candidate for being dropped. Of course you'll want to do some additional checking, just to make sure that the index truly isn't needed.

Another way to boost index effectiveness is to add columns to an existing index, so as to increase the number of columns that can be matched with query predicates or to get index-only access for more queries. But be careful with this tactic; you do not want to replicate a table in an index by including all of the table's columns in the index key.

Consider adding a column to an existing index if the additional column is relatively short and if its inclusion in the index key would provide a major performance boost for one or more high-cost queries. With DB2 for LUW, you'd drop the existing index and re-create it with the additional column; on a DB2 for z/OS system, you can add a column to an index via an ALTER INDEX ADD COLUMN statement. DB2 10 for z/OS introduces a nifty new way to add a column to an existing index: the ADD INCLUDE option of ALTER INDEX. Check out the Resources sidebar to learn more.


No-new-index tuning options

Rewriting long-running queries is another way to boost performance without adding indexes. One technique often used is rewriting to eliminate predicates linked by OR, which are not indexable (aside from the possibility of DB2 using a multi-index access mechanism, called index ORing, that involves the union of row ID lists from two different indexes).

Using a UNION or an IN-list are two common ways to eliminate an OR in a predicate (see Table 1).

Rewriting to eliminate predicates linked by OR

Eliminate OR with UNION

SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND A.COL1 = 2 OR B.COL2 = 'CAT'

Becomes

SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND A.COL1 = 2

UNION

SELECT * FROM TABLE_A A, TABLE_B B WHERE A.COL3 = B.COL3 AND B.COL2 = 'CAT'

Eliminate OR with an IN-list

SELECT * FROM TABLE_A

WHERE (COL1 = 2 AND COL2 = 'CAT') OR (COL1 = 2 AND COL2 = 'DOG')

Becomes

SELECT * FROM TABLE_A

WHERE COL1 = 2 AND COL2 IN ('CAT', 'DOG')

Using UNION to replace an OR can be especially effective if the query involves a join and the two ORed predicates reference columns in two different tables. In such a case, the OR prevents the two predicates from being applied until after the respective tables have been joined. Note also that in the IN-list example, both of the predicates in the rewritten query are stage 1 and indexable, leading to better index key column matching and potentially much-reduced query elapsed time.

Also in the query rewrite vein, modifying a SELECT to get rid of a range-type predicate can increase performance by improving index key column matching. As we saw in the last issue, the matching of predicates to index key columns stops after a range predicate—such as >, <, or BETWEEN—is encountered. In some cases, you can use an IN-list to replace a range predicate. For example, if COL1 is a character column of length one, a predicate like COL1 BETWEEN '2' AND '4' could be rewritten as COL1 IN ('2', '3', '4').

Here's one more query rewrite idea: if possible, remove column expressions in predicates to make the predicates indexable. Let's say that :V1 and :V2 are host variables, with :V2 containing a date value. Now, consider this predicate:

  (START_DATE - :V1 DAYS) < :V2

The arithmetic expression involving the START_DATE column makes the predicate non-indexable. If you rewrite the predicate to get the arithmetic operation on the right side of the operator ("operator" being =, >, <, etc.), it becomes indexable and you could see a big reduction in query run time. The rewritten example predicate would look like this:

  START_DATE < (DATE(:V2) + :V1 DAYS)

No query mods needed

There are also no-new-index-needed tuning actions that don't require query modification. Do you have a high-cost query with a predicate that references a column that has many duplicate values and for which the distribution of those values is very non-uniform (for example, a column of a 1-million-row table that has 1,000 distinct values, one of which appears in 900,000 of the 1 million rows)?

Let DB2 know about that uneven distribution of duplicate values by gathering value-frequency distribution and/or histogram statistics for the column via the RUNSTATS utility. Absent these statistics, DB2 assumes a uniform spread of duplicate column values across a table's rows, so in the aforementioned example it would be assumed that each of the 1,000 distinct column values appears in 1,000 of the table's 1 million rows. For DB2 for LUW, column value distribution statistics are collected via the WITH DISTRIBUTION clause of RUNSTATS. For DB2 for z/OS, value-frequency stats are generated using the FREQVAL option of RUNSTATS, and histogram statistics—available on the mainframe platform starting with DB2 9—are obtained via the HISTOGRAM option (histogram statistics have been available on DB2 for LUW systems for quite some time).

Once the catalog has been populated with value-frequency and/or histogram statistics, rebind the program (or just submit the query again, if it's dynamic SQL), and you may see a big-time performance improvement. By the way, you can get recommendations on enriching catalog statistics to potentially improve the performance of a query by running the query through the Statistics Advisor component of IBM's Data Studio tool.


The right tool for the job

When the task at hand is query performance tuning, of course you should keep the new-index option in mind. Just remember that it's a high-value card that should be played at the right time. And don't forget the other techniques available to you. Indexes aren't the only tool in the query-tuning shed, pardner.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=625897
ArticleTitle=Data architect: DB2 indexes and query performance: Part 2
publish-date=01142011