Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Programmers Only: Does the Order of SQL Predicates Matter?

Rearranging WHERE clause predicates may be an exercise in futility

Bonnie Baker, Owner, Bonnie Baker Corporation
Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.

Summary:  Bonnie Baker explores the use of SQL predicates—how they may improve performance, and how DB2 works with predicate statements in parsing.

IBM Data Management magazine table of contents

View more content in this series

Date:  10 Oct 2010
Level:  Intermediate
Also available in:   Chinese

Activity:  4160 views
Comments:  

- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine

In this column I like to answer questions that I am asked most often via e-mail, at conferences, and by my class and seminar attendees. One of the top 10 questions is: "Does the order in which I code my predicates matter?"

Sometimes this question is followed by the proclamation that the questioner is required to adhere to a "shop standard" that details the order in which predicates should be coded. Two examples of these standards are:

  1. Code join predicates first, followed by local predicates (predicates on a single table) in the same order as the named tables appear in the FROM clause.
  2. The most-filtering predicates should be coded before the least-filtering predicates. (This is the standard that I hear most frequently.)

My response to this top 10 question is: Rearranging WHERE clause predicates may be an exercise in futility. You may occasionally be able to fine-tune some (but certainly not all) SQL to improve performance and reduce CPU usage by rearranging the predicates. In these rare situations, the results will not be noticeable if the SQL is executed infrequently and addresses very few rows. But, if the SQL is popular and is executed millions of times a day, addressing hundreds of thousands of rows each time, the cumulative total could be significant.

In case you're already wondering, in some SQL neither of the standards dictated previously will result in performance gains. That said, there is a standard you can use to ensure that your predicates are coded in an order that will never be detrimental and may actually be beneficial to performance.

So, let's look at how DB2 feels about the order in which your predicates are coded. For the purposes of discussion, our chosen index is a three-column index called IX1 created on (COLA, COLB, COLC).

Our popular, important SQL with its 10 predicates is:

    Select … 
      from big_table
    Where colc = :hvc
      and colb > :hvb
      and cola = :hva
      and cold = :hvd
      and cole > :hve
      and colf in (:hvf1, :hvf2, :hvf3)
      and colg between :hvgbegin and :hvgend
      and colh + coli = :hvtotal
      and colj = :hv1
      and colj like :hvj

Now, let's get down to business.

In what order should your predicates be coded?

Sometimes it just doesn't make a difference. Why? Because DB2 is going to rearrange your predicates before applying them, and that new order may be exactly right for your SQL.

Most of the time you can take advantage of any order that makes the SQL more readable and easier to maintain without affecting performance. An example of such a "readability" order is mentioned in shop standard example 1 (code join predicates before local predicates).


When does the order make a difference?

At static and dynamic BIND time, the DB2 Optimizer parses your SQL and—regardless of the order in which you coded them—rearranges the predicates in a predetermined order. This order is based upon the filter-factor-driven approach, which says that predicates that filter out (that is, eliminate) the most rows the soonest should be applied before those that filter out the least (that is, qualify the most) rows.

In our earlier SQL, the parsing would result in our predicates being applied in the following order:

    Where cola = :hva
      and colb > :hvb
      and colc = :hvc
      and cold = :hvd
      and colj = :hv1
      and cole > :hve
      and colg between :hvgbegin and :hvgend
      and colf in (:hvf1, :hvf2, :hvf3)
      and colj like :hvj
      and colh + coli = :hvtotal

How did we end up with that arrangement? To start, DB2 always applies index predicates first, following the order in which the index is created. Our chosen index is on (COLA, COLB, COLC), so those predicates—the first three lines in the preceding example—will be applied in that order. If you want those three predicates applied in the exact order in which you code them, you must drop and re-create the index (or create a new index) on (COLA, COLC, COLB).

Second, regardless of the order in which they are coded, Stage 1 non-index predicates are applied in the following order:

1. Equal predicates

2. Range predicates

3. In-list and like predicates

DB2 took our Stage 1 non-index predicates and put the two equal predicates before our two range predicates (> and between), which are placed before our two in-list and like predicates.

Finally our Stage 2 predicate (and colh + coli = :hvtotal) is applied. Why last? Because now those higher-CPU, mathematical predicates will be applied to fewer rows; that is, only the rows that are left after applying the other nine predicates.

You may have noticed that in the parsed SQL, the two equal predicates will be applied in the order in which they were originally coded. Likewise, the two range predicates will be applied in their coded order, and the in-list and like predicates were left in the order originally coded.


Besides creating your index in a different order, what can you do?

The like-kind non-index predicates are applied in the order coded. Therefore, if you think the COLI predicate should be applied before the COLD predicate (to filter out more rows sooner), you must code the two predicates in COLI, COLD order. If you want the between predicate applied before the > (greater than) predicate, you must code the predicates in COLG, COLE order. And if you want the like predicate applied before the in-list predicate, your coding order must reflect that.

What I'm saying here is that for non-index predicates, DB2 does not consider the COLCARD or the COLUMN DISTRIBUTION statistics. Those statistics are used to aid in index selection, not for predicate rearrangement.


What standard should you use?

Based on all of this discussion, it would appear that the standard that advocates coding the most-filtering predicates before the least-filtering predicates is desirable. However, that standard may, in some situations, degrade performance for our popular SQL. What if we added the following non-index predicates to our SQL:

    Where … our first 10 predicates
      and colm = :hvm
      and colw = :hvw

In this case, coding the most-filtering predicate (the one with the higher column cardinality) before the least-filtering predicate works just fine because we want to disqualify as many rows as possible as soon as possible. But what if our non-index predicates look like this:

    Where … our first 10 predicates
      and ( colm = :hvm
       or colw = :hvw )

In that example, coding the most-filtering predicate before the least-filtering predicate will work to our disadvantage. Some would think that with OR logic, the order of the predicates wouldn't make a difference since they both must be applied anyway. But the truth is that it does matter, because both do not always need to be applied.

You see, as soon as the row qualifies, predicate application within the OR list stops. The second predicate is applied only when the row is disqualified by the first predicate. Therefore, unlike AND logic, with OR logic we want to code the least-filtering predicate before the most-filtering predicate. To say it another way: we want to code the more-qualifying predicate before the least-qualifying predicate.


Is there a standard that is appropriate for all SQL?

Yes. Create your indexes in the order in which you want those index column predicates applied. Then code your SQL with AND predicates coded with most-filtering predicates before least-filtering predicates and OR predicates coded the opposite way, with least-filtering predicates coded before most-filtering predicates.

Keep something else in mind: if you really know your data and your search criteria, you may know that some predicates filter out more than others regardless of COLCARD, because the search values are highly distributed. With host variables or literals, DB2 does not consider this factor for non-index predicates; only you can take this fact into account and code accordingly.


Stay on top of your standards

Coding standards should be readdressed with each release of DB2 and with each learning curve. A new release or a newly learned fact can make you rethink those old rules.

With that in mind, my next column will be a rewrite of a very popular past column in which I explained how predicting the order of your result rows is not as simple it sounds. The latest releases of DB2 have altered and added to my opinions on this subject, and I want to share those thoughts with you.


Partner Resources
IBM Information On Demand Virtual 2010 IBM Server and Storage
Quest Software Safari Books Online

Resources

About the author

Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=550190
ArticleTitle=Programmers Only: Does the Order of SQL Predicates Matter?
publish-date=10102010
author1-email=bkbaker@bonniebaker.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers