DB2 10: Paging through result sets by value using row value comparisons
Comments (6) Visits (16750)
I have have been engaged in discussions on how to use DB2 for LUW for as long as I have been employed with IBM.
Until not too long ago, before I started blogging, the primary platform for such discussion was usenet.
And one of the very oldest debates I recall was with Bernard D.
For at least a decade he has been tirelessly trying to convince us to extend DB2 for LUW to support comparisons of "row value constructors" (rvc).
In fact the oldest record I can find on Google on this debate dates back to October 21, 2000.
Well, Bernard, this post is for you.
Now we need to find something else to talk about I guess.
In the past I have described various ways how to page through result sets using constructs such as LIMIT and OFFSET.
All of these methods have one thing in common: They are based on the absolute position of rows within a result set.
For example OFFSET 100 LIMIT 20 will retrieve 20 rows starting with the 100th row.
So, in order for DB2 to find the 100th row DB2 needs to count of the first 99 rows and discard them.
The farther down a result set you page the bigger the waste in CPU and I/O.
Let's have a look:
A look at the optimizer plan shows that there is little DB2 can do to minimize the rows read.
Rows RETURN ( 1) Cost I/O | 1 FILTER ( 2) 6.80181 1 | 10 FETCH ( 3) 6.79467 1 /-----+------\ 10 10 IXSCAN TABLE: ADMINISTRATOR ( 4) EMP 0.0211523 Q1 0 | 10 INDEX: ADMINISTRATOR EMPNAME Q1
Value based paging
A better way to do paging is to use the index not only to derive the order, but also the start key.
VARIABLE name VARCHAR(20); BEGIN SET :name = ' '; END; / SELECT * FROM emp
We can use the last retrieved value as a starting point for the next page:
BEGIN SET :name = 'Kruft'; END; / SELECT * FROM emp
The optimizer plan is much better now.
It allows us to find the first row efficiently:
Rows RETURN ( 1) Cost I/O | 3.33333 FETCH ( 2) 6.78901 1 /-----+------\ 3.33333 10 IXSCAN TABLE: ADMINISTRATOR ( 3) EMP 0.0177397 Q1 0 | 10 INDEX: ADMINISTRATOR EMPNAME Q1 ... 3) IXSCAN: (Index Scan) ... Predicates: ---------- 2) Start Key Predicate, Comparison Operator: Less Than (<) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (? < Q1.NAME)
Now, I have been cheating here since the index is on two columns "NAME" and "FIRSTNAME".
If there were two or more employees named 'Kruft' we could have skipped one.
Traditional row wise paging
Here is how one can page using a multiple column start key in DB2 no matter what version:
VARIABLE firstname VARCHAR(20); BEGIN SET (:name, :firstname) = (' ', ' '); END; /
How would the predicate look like if we had three key parts? Lenghty ...
SELECT * FROM emp
The problem is that in older versions of DB2 the optimizer cannot figure out that this predicate maps to an index start key.
Therefore DB2 needed to fall back to a scan from the beginning of the index.
Since DB2 9.7.3 the optimizer is smart enough:
Rows RETURN ( 1) Cost I/O | 3.55556 TBSCAN ( 2) 6.79745 1 | 3.55556 SORT ( 3) 6.79697 1 | 3.55556 TBSCAN ( 4) 6.79574 1 | 10 TABLE: ADMINISTRATOR EMP Q1
This is not what we expected because I use dynamic SQL here!
In dynamic SQL DB2 does not see that the two references to :name are in fact the same values.
The following trick allows DB2 to see what we are up to.
It is generally useful whenever you must use the same parameter marker multiple times in a query.
This is better! However the language we need to use to model the start key is quite ugly.
Row value comparison
Reaching back to high school math you may recall that the predicates we have used above to impose order are commonly used to compose order on tuples.
A tuple A is an ordered list of elements of the form (a1, a2, ... an).
A tuple A (a1, a2, ..an) is equal to a tuple B (b1, b2, bn) if and only if all the elements ai in A a re equal to the matching bi in B.
A is considered greater than B if a1 > b1 or (a1 = b1 and a2 > b2) or ... and so forth.
It is not surprising that this is also the ordering used in a multi column index or that imposed by an ORDER BY clause.
In math you can directly use comparison operators against tuples with no need to decompose them into the individual components.
You simply write: (a1, a2) > (b1, b2).
In SQL terms a set of elements compose not tuples but rows of course.
Thus the syntax composing a tuple is called a row value constructor (rvc).
You use it every time you insert a row into a table using the values clause.
Just at the beginning of this post we used a values clause with 10 row value constructors.
DB2 10 , finally, introduces the syntax to compare rows value constructors:
SELECT * FROM emp
So far we have always used a fixed number of rows to page forward.
But you can also use row value comparisons operators to set an upper bound or even both upper and lower.
VARIABLE startname VARCHAR(20); VARIABLE endname VARCHAR(20); VARIABLE startfirst VARCHAR(20); VARIABLE endfirst VARCHAR(20); BEGIN SET (:startname, :startfirst) = ('M', 'A'); SET (:endname , :endfirst ) = ('T', 'Z'); END; / SELECT * FROM emp
We are explicitly driving the index start/stop key here.
You can also use this syntax to write denser key lookups:
SELECT * FROM emp
There is no limit to the usage of the syntax.
It can even be used in JOIN On clauses
A cursor loop without a cursor
As I have been writing this article I have been wondering about an extreme case.
That is if I page through a result set one row at a time, then I should be able to write cursor that really isn't a cursor.
I'm not sure if it's good for anything, but here goes anyway:
VARIABLE sumsalary INTEGER;
So this anonymous block pages through the entire table one row at a time always getting the "next row" all without actually using a cursor.