Skip to main content

Data Architect: DB2 Indexes and Query Performance: Part 1

Weigh costs and benefits when considering additional indexes

Robert Catterall (rfcatter@us.ibm.com), IBM DB2 specialist, IBM
Robert Catterall is an IBM DB2 specialist.

Summary:  Adding an index to a DB2 database is often the best way to improve query performance, but the addition can have trade-offs. In this first of two articles, Robert Catterall looks at the performance costs of adding indexes as well as the proper way to specify them.

IBM Data Management magazine table of contents

View more content in this series

Date:  10 Oct 2010
Level:  Intermediate

Comments:  

- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine

For plenty of you, indexes are the duct tape of DB2 performance fixes. Got a response-time issue? Add an index to the target table. In truth, reducing the elapsed time of a DB2-accessing query or batch job frequently involves creating a new index; however, while an additional index is very often a solution to a DB2 performance problem, it is not always the right solution. In this first part of a two-part series, I'll point out the costs that go along with the benefits of DB2 indexes, sketch out some rules for when to consider other solutions, and talk about ways to create more effective indexes. Next issue, I'll describe some query performance-tuning actions that do not involve new-index creation and that you can implement when a table is already heavily indexed and you are leery of adding more.

There's no free lunch

Indexes are definitely not without cost. Every additional index that you define on a table adds to the costs associated with your database. Start with CPU overhead: every index on a table multiplies the CPU cost of each insert or delete operation, or the CPU cost of any update that changes the value of an indexed column. Insert a row into a table with three indexes, and DB2 must add a corresponding entry to an index three times. Delete a row from a table with 10 indexes, and DB2 must remove all index entries associated with the deleted rows.

All of this index maintenance activity involves DB2 page requests (called GETPAGEs in a DB2 for z/OS system and logical reads in a DB2 for Linux, UNIX, and Windows [LUW] environment), and these tend to be the primary determinant of the CPU cost of SQL statement execution.

Extra indexes also add to the CPU consumption of certain DB2 utilities, such as the REORG utility. Indexes defined on keys that are not continuously ascending will naturally become less well-organized over time. New entries must be placed in particular index leaf pages (because index-key sequencing is strictly enforced, whereas data-row clustering in a table is not), and if the page into which an entry must be placed is already full (not uncommon when the key is not continuously ascending), the page will be split and some of the entries will be relocated to a formerly empty page that might be a long way away in the index structure. REORG will restore a disorganized index to a well-organized state, at a cost of some CPU consumption.

Other utilities that get more expensive as more indexes are defined on a table include LOAD and RUNSTATS (index statistics in the DB2 catalog must be kept up-to-date), and the utilities that perform index-to-data consistency checking (that would be CHECK INDEX for DB2 for z/OS, and INSPECT with INDEXDATA for DB2 for LUW).

Indexes also have a disk space cost, although that doesn't need to be as brutal as it would have been a few years ago, thanks to the index compression features of DB2 9 for z/OS and DB2 9.7 for LUW. On both platforms, you can now get large space savings (often 50 percent or more) with little in the way of additional CPU overhead.


How many indexes?

As the number of indexes on a table grows, eventually the aggregate CPU cost of maintaining the indexes can outweigh whatever performance gains a new index might deliver. But where is that tipping point? The answer depends on the specific situation, but I generally don't like to see more than 4 or 5 indexes on a table in an online transaction processing (OLTP) environment, or more than 8 to 10 indexes on a table in a data warehouse system. I'm usually OK with a larger number of indexes per table in a data warehouse, because the emphasis there is typically on optimizing data retrieval performance, and query search arguments tend to be less predictable in a business intelligence setting.

Now, these index-limitation guidelines are more rules of thumb than rules to live by. Suppose a table in an OLTP database already has 6 or 7 indexes. Would I give a thumbs-up to adding another? Maybe, if the potential performance payoff is really big—and when I say big, I mean that the new index is expected to reduce the CPU cost of one (or several) of the more expensive queries in the system by 90 percent or more. Even if the proposed new index looks like a performance winner, before going ahead with implementation I might want to see if there are any good tuning alternatives that do not involve creating a new index (that will be the topic of Part 2 of this column).


If you're going to do it, do it right

When you do add an index to a table to cut the CPU cost and run time of a query, you want to get the most out of that new index as practically possible. That means, among other things, getting the order of the key columns right if it's a multi-index key. Why is that important? Because a query will run faster—sometimes much faster—if the query's predicates (its search arguments) match more columns of an index key; and when it comes to DB2's rules for matching predicates with index-key columns, order matters.

Here's a simplified (and therefore not all-inclusive) version of the rules for the most common predicate types, which include equal-type, range (for example, > or <=), and in-list (for example, COL1 IN ('DOG', 'CAT')):

  1. For a multicolumn index key, predicate matching starts with the high-order column and proceeds as far as possible with other key columns, one at a time, in the order of their sequence within the index key. Put another way, matching starts at the far left of the index key and proceeds rightward.
  2. Getting a match on the column in position n+1 in the index key requires matching on the column in position n with an equal-type predicate or an in-list predicate. In other words, once a key column has been matched with a range predicate, there won't be any matching on columns to the right of that column.
  3. Matching stops when a column in the key is skipped. That is, if you get a match on the column in position n of the key, and there is not a matching predicate for the column in position n+1, there can't be a match on the column in position n+2.

So, for example, if you have a query with predicates COL1 > 2 AND COL2 = 'BRICK', an index defined on COL2, COL1 will give you two predicate/key-column matches, while an index on COL1, COL2 will give you one match (refer to rule 2 in the preceding list: matching stops once a key column has been matched with a range predicate).


Use indexes, but choose wisely

Indexes can really turbocharge DB2 query performance, but if you put too many on a table, you could end up with response time and throughput going in the wrong direction. Start out conservatively, with just a few—maybe two or three—on each table in your database. That way, you'll have some reserve index-add capacity, and you'll probably be able to define additional indexes that will look really good from a cost-benefit perspective. Just don't overdo it, and don't waste precious index-add capital on indexes that deliver only marginal query performance benefits—you're looking for the big wins here.

In my next column, I'll show you that there are plenty of things you can do to improve query response times without creating new indexes. Combining those techniques with judicious use of additional indexes is the two-pronged performance-tuning approach that is most likely to get you where you want to go.


Partner Resources
Fourth Millennium Technologies IBM Client Reference Program
Melissa Data Responsive Systems

Resources

About the author

Robert Catterall is an IBM DB2 specialist.

Comments



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=550024
ArticleTitle=Data Architect: DB2 Indexes and Query Performance: Part 1
publish-date=10102010
author1-email=rfcatter@us.ibm.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).