Contents


Programmers Only

Does the Order of SQL Predicates Matter?

Rearranging WHERE clause predicates may be an exercise in futility

Comments

Content series:

This content is part # of # in the series: Programmers Only

Stay tuned for additional content in this series.

This content is part of the series:Programmers Only

Stay tuned for additional content in this series.

In this column I like to answer questions that I am asked most often via e-mail, at conferences, and by my class and seminar attendees. One of the top 10 questions is: "Does the order in which I code my predicates matter?"

Sometimes this question is followed by the proclamation that the questioner is required to adhere to a "shop standard" that details the order in which predicates should be coded. Two examples of these standards are:

  1. Code join predicates first, followed by local predicates (predicates on a single table) in the same order as the named tables appear in the FROM clause.
  2. The most-filtering predicates should be coded before the least-filtering predicates. (This is the standard that I hear most frequently.)

My response to this top 10 question is: Rearranging WHERE clause predicates may be an exercise in futility. You may occasionally be able to fine-tune some (but certainly not all) SQL to improve performance and reduce CPU usage by rearranging the predicates. In these rare situations, the results will not be noticeable if the SQL is executed infrequently and addresses very few rows. But, if the SQL is popular and is executed millions of times a day, addressing hundreds of thousands of rows each time, the cumulative total could be significant.

In case you're already wondering, in some SQL neither of the standards dictated previously will result in performance gains. That said, there is a standard you can use to ensure that your predicates are coded in an order that will never be detrimental and may actually be beneficial to performance.

So, let's look at how DB2 feels about the order in which your predicates are coded. For the purposes of discussion, our chosen index is a three-column index called IX1 created on (COLA, COLB, COLC).

Our popular, important SQL with its 10 predicates is:

    Select … 
      from big_table
    Where colc = :hvc
      and colb > :hvb
      and cola = :hva
      and cold = :hvd
      and cole > :hve
      and colf in (:hvf1, :hvf2, :hvf3)
      and colg between :hvgbegin and :hvgend
      and colh + coli = :hvtotal
      and colj = :hv1
      and colj like :hvj

Now, let's get down to business.

In what order should your predicates be coded?

Sometimes it just doesn't make a difference. Why? Because DB2 is going to rearrange your predicates before applying them, and that new order may be exactly right for your SQL.

Most of the time you can take advantage of any order that makes the SQL more readable and easier to maintain without affecting performance. An example of such a "readability" order is mentioned in shop standard example 1 (code join predicates before local predicates).

When does the order make a difference?

At static and dynamic BIND time, the DB2 Optimizer parses your SQL and—regardless of the order in which you coded them—rearranges the predicates in a predetermined order. This order is based upon the filter-factor-driven approach, which says that predicates that filter out (that is, eliminate) the most rows the soonest should be applied before those that filter out the least (that is, qualify the most) rows.

In our earlier SQL, the parsing would result in our predicates being applied in the following order:

    Where cola = :hva
      and colb > :hvb
      and colc = :hvc
      and cold = :hvd
      and colj = :hv1
      and cole > :hve
      and colg between :hvgbegin and :hvgend
      and colf in (:hvf1, :hvf2, :hvf3)
      and colj like :hvj
      and colh + coli = :hvtotal

How did we end up with that arrangement? To start, DB2 always applies index predicates first, following the order in which the index is created. Our chosen index is on (COLA, COLB, COLC), so those predicates—the first three lines in the preceding example—will be applied in that order. If you want those three predicates applied in the exact order in which you code them, you must drop and re-create the index (or create a new index) on (COLA, COLC, COLB).

Second, regardless of the order in which they are coded, Stage 1 non-index predicates are applied in the following order:

1. Equal predicates

2. Range predicates

3. In-list and like predicates

DB2 took our Stage 1 non-index predicates and put the two equal predicates before our two range predicates (> and between), which are placed before our two in-list and like predicates.

Finally our Stage 2 predicate (and colh + coli = :hvtotal) is applied. Why last? Because now those higher-CPU, mathematical predicates will be applied to fewer rows; that is, only the rows that are left after applying the other nine predicates.

You may have noticed that in the parsed SQL, the two equal predicates will be applied in the order in which they were originally coded. Likewise, the two range predicates will be applied in their coded order, and the in-list and like predicates were left in the order originally coded.

Besides creating your index in a different order, what can you do?

The like-kind non-index predicates are applied in the order coded. Therefore, if you think the COLI predicate should be applied before the COLD predicate (to filter out more rows sooner), you must code the two predicates in COLI, COLD order. If you want the between predicate applied before the > (greater than) predicate, you must code the predicates in COLG, COLE order. And if you want the like predicate applied before the in-list predicate, your coding order must reflect that.

What I'm saying here is that for non-index predicates, DB2 does not consider the COLCARD or the COLUMN DISTRIBUTION statistics. Those statistics are used to aid in index selection, not for predicate rearrangement.

What standard should you use?

Based on all of this discussion, it would appear that the standard that advocates coding the most-filtering predicates before the least-filtering predicates is desirable. However, that standard may, in some situations, degrade performance for our popular SQL. What if we added the following non-index predicates to our SQL:

    Where … our first 10 predicates
      and colm = :hvm
      and colw = :hvw

In this case, coding the most-filtering predicate (the one with the higher column cardinality) before the least-filtering predicate works just fine because we want to disqualify as many rows as possible as soon as possible. But what if our non-index predicates look like this:

    Where … our first 10 predicates
      and ( colm = :hvm
       or colw = :hvw )

In that example, coding the most-filtering predicate before the least-filtering predicate will work to our disadvantage. Some would think that with OR logic, the order of the predicates wouldn't make a difference since they both must be applied anyway. But the truth is that it does matter, because both do not always need to be applied.

You see, as soon as the row qualifies, predicate application within the OR list stops. The second predicate is applied only when the row is disqualified by the first predicate. Therefore, unlike AND logic, with OR logic we want to code the least-filtering predicate before the most-filtering predicate. To say it another way: we want to code the more-qualifying predicate before the least-qualifying predicate.

Is there a standard that is appropriate for all SQL?

Yes. Create your indexes in the order in which you want those index column predicates applied. Then code your SQL with AND predicates coded with most-filtering predicates before least-filtering predicates and OR predicates coded the opposite way, with least-filtering predicates coded before most-filtering predicates.

Keep something else in mind: if you really know your data and your search criteria, you may know that some predicates filter out more than others regardless of COLCARD, because the search values are highly distributed. With host variables or literals, DB2 does not consider this factor for non-index predicates; only you can take this fact into account and code accordingly.

Stay on top of your standards

Coding standards should be readdressed with each release of DB2 and with each learning curve. A new release or a newly learned fact can make you rethink those old rules.

With that in mind, my next column will be a rewrite of a very popular past column in which I explained how predicting the order of your result rows is not as simple it sounds. The latest releases of DB2 have altered and added to my opinions on this subject, and I want to share those thoughts with you.


Downloadable resources


Related topic


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=550190
ArticleTitle=Programmers Only: Does the Order of SQL Predicates Matter?
publish-date=10102010