- In what order should your predicates be coded?
- When does the order make a difference?
- Besides creating your index in a different order, what can you do?
- What standard should you use?
- Is there a standard that is appropriate for all SQL?
- Stay on top of your standards
- Downloadable resources
- Related topic
Does the Order of SQL Predicates Matter?
Rearranging WHERE clause predicates may be an exercise in futility
This content is part # of # in the series: Programmers Only
This content is part of the series:Programmers Only
Stay tuned for additional content in this series.
In this column I like to answer questions that I am asked most often via e-mail, at conferences, and by my class and seminar attendees. One of the top 10 questions is: "Does the order in which I code my predicates matter?"
Sometimes this question is followed by the proclamation that the questioner is required to adhere to a "shop standard" that details the order in which predicates should be coded. Two examples of these standards are:
- Code join predicates first, followed by local predicates (predicates on a single table) in the same order as the named tables appear in the
- The most-filtering predicates should be coded before the least-filtering predicates. (This is the standard that I hear most frequently.)
My response to this top 10 question is: Rearranging
WHERE clause predicates may be an exercise in futility. You may occasionally be able to fine-tune some (but certainly not all) SQL to improve performance and reduce CPU usage by rearranging the predicates. In these rare situations, the results will not be noticeable if the SQL is executed infrequently and addresses very few rows. But, if the SQL is popular and is executed millions of times a day, addressing hundreds of thousands of rows each time, the cumulative total could be significant.
In case you're already wondering, in some SQL neither of the standards dictated previously will result in performance gains. That said, there is a standard you can use to ensure that your predicates are coded in an order that will never be detrimental and may actually be beneficial to performance.
So, let's look at how DB2 feels about the order in which your predicates are coded. For the purposes of discussion, our chosen index is a three-column index called IX1 created on (
COLA, COLB, COLC).
Our popular, important SQL with its 10 predicates is:
Select … from big_table Where colc = :hvc and colb > :hvb and cola = :hva and cold = :hvd and cole > :hve and colf in (:hvf1, :hvf2, :hvf3) and colg between :hvgbegin and :hvgend and colh + coli = :hvtotal and colj = :hv1 and colj like :hvj
Now, let's get down to business.
In what order should your predicates be coded?
Sometimes it just doesn't make a difference. Why? Because DB2 is going to rearrange your predicates before applying them, and that new order may be exactly right for your SQL.
Most of the time you can take advantage of any order that makes the SQL more readable and easier to maintain without affecting performance. An example of such a "readability" order is mentioned in shop standard example 1 (code join predicates before local predicates).
When does the order make a difference?
At static and dynamic
BIND time, the DB2 Optimizer parses your SQL and—regardless of the order in which you coded them—rearranges the predicates in a predetermined order. This order is based upon the filter-factor-driven approach, which says that predicates that filter out (that is, eliminate) the most rows the soonest should be applied before those that filter out the least (that is, qualify the most) rows.
In our earlier SQL, the parsing would result in our predicates being applied in the following order:
Where cola = :hva and colb > :hvb and colc = :hvc and cold = :hvd and colj = :hv1 and cole > :hve and colg between :hvgbegin and :hvgend and colf in (:hvf1, :hvf2, :hvf3) and colj like :hvj and colh + coli = :hvtotal
How did we end up with that arrangement? To start, DB2 always applies index predicates first, following the order in which the index is created. Our chosen index is on (
COLA, COLB, COLC), so those predicates—the first three lines in the preceding example—will be applied in that order. If you want those three predicates applied in the exact order in which you code them, you must drop and re-create the index (or create a new index) on (
COLA, COLC, COLB).
Second, regardless of the order in which they are coded, Stage 1 non-index predicates are applied in the following order:
1. Equal predicates
2. Range predicates
3. In-list and like predicates
DB2 took our Stage 1 non-index predicates and put the two equal predicates before our two range predicates (
between), which are placed before our two in-list and like predicates.
Finally our Stage 2 predicate (
and colh + coli = :hvtotal) is applied. Why last? Because now those higher-CPU, mathematical predicates will be applied to fewer rows; that is, only the rows that are left after applying the other nine predicates.
You may have noticed that in the parsed SQL, the two equal predicates will be applied in the order in which they were originally coded. Likewise, the two range predicates will be applied in their coded order, and the in-list and like predicates were left in the order originally coded.
Besides creating your index in a different order, what can you do?
The like-kind non-index predicates are applied in the order coded. Therefore, if you think the
COLI predicate should be applied before the
COLD predicate (to filter out more rows sooner), you must code the two predicates in
COLI, COLD order. If you want the
between predicate applied before the
> (greater than) predicate, you must code the predicates in
COLG, COLE order. And if you want the like predicate applied before the in-list predicate, your coding order must reflect that.
What I'm saying here is that for non-index predicates, DB2 does not consider the
COLCARD or the
COLUMN DISTRIBUTION statistics. Those statistics are used to aid in index selection, not for predicate rearrangement.
What standard should you use?
Based on all of this discussion, it would appear that the standard that advocates coding the most-filtering predicates before the least-filtering predicates is desirable. However, that standard may, in some situations, degrade performance for our popular SQL. What if we added the following non-index predicates to our SQL:
Where … our first 10 predicates and colm = :hvm and colw = :hvw
In this case, coding the most-filtering predicate (the one with the higher column cardinality) before the least-filtering predicate works just fine because we want to disqualify as many rows as possible as soon as possible. But what if our non-index predicates look like this:
Where … our first 10 predicates and ( colm = :hvm or colw = :hvw )
In that example, coding the most-filtering predicate before the least-filtering
predicate will work to our disadvantage. Some would think that with
OR logic, the order of the predicates wouldn't make a difference
since they both must be applied anyway. But the truth is that it does matter,
because both do not always need to be applied.
You see, as soon as the row qualifies, predicate application within the
OR list stops. The second predicate is applied only when the row is disqualified by the first predicate. Therefore, unlike
AND logic, with
OR logic we want to code the least-filtering predicate before the most-filtering predicate. To say it another way: we want to code the more-qualifying predicate before the least-qualifying predicate.
Is there a standard that is appropriate for all SQL?
Yes. Create your indexes in the order in which you want those index column predicates applied. Then code your SQL with
AND predicates coded with most-filtering predicates before least-filtering predicates and
OR predicates coded the opposite way, with least-filtering predicates coded before most-filtering predicates.
Keep something else in mind: if you really know your data and your search criteria, you may know that some predicates filter out more than others regardless of
COLCARD, because the search values are highly distributed. With host variables or literals, DB2 does not consider this factor for non-index predicates; only you can take this fact into account and code accordingly.
Stay on top of your standards
Coding standards should be readdressed with each release of DB2 and with each learning curve. A new release or a newly learned fact can make you rethink those old rules.
With that in mind, my next column will be a rewrite of a very popular past column in which I explained how predicting the order of your result rows is not as simple it sounds. The latest releases of DB2 have altered and added to my opinions on this subject, and I want to share those thoughts with you.
- Learn more about DB2 for z/OS.