- Read this article in our interactive digital edition format!
- Subscribe to IBM Data Management magazine
In the last issue, I began a series of columns concerning new aspects of
ORDER BY. This column—Part 2—covers the impact
of using the
CREATE/ALTER
INDEX RANDOM order
option on sort avoidance. It also covers the issue of coding only
GROUP BY COL1, COL2 versus coding both
GROUP BY COL1, COL2 and a follow-up
ORDER BY COL1, COL2 in a SQL statement.
A brief review of sort avoidance
Sort syntax such as ORDER BY and
GROUP BY does not necessarily cause a data
sort. With the appropriate access path, the
ORDER BY or GROUP BY
requirement can be met without sorting. (An appropriate access path means
that an index is available and used so that the index drives the access to
the data. DB2 can then return qualified rows to the program in the desired
order, the order of the index.)
But what happens when the order of the index column is random? Before DB2 9
for z/OS, we could CREATE indexes with columns
that had ascending key values and/or descending key values. As of DB2 9,
we can also CREATE indexes with columns that we
designate as RANDOM. When we use this option,
key values are encoded so that the values are stored randomly. Note that
identical key values will encode the same and will therefore be
contiguous. Let’s look at an example of index data where one of the
columns is RANDOM. Assume an index on
DEPTNO ASCENDING, LASTNAME RANDOM,
EMPNO ASCENDING
in which the decoded values are as follows:
| ASC DEPTNO | RANDOM LASTNAME | ASC EMPNO |
|---|---|---|
| A01 | Yothers | 12345 |
| A01 | Josten | 21234 |
| A01 | Josten | 21235 |
| A01 | Miller | 14567 |
| A01 | Miller | 14568 |
| A01 | Miller | 14569 |
| B01 | Zagelow | 16657 |
| B01 | Bossman | 15678 |
| B01 | Bossman | 15679 |
Notice that DEPTNO is in ascending order.
LASTNAME is in random order within each
DEPTNO, but like values are contiguous.
EMPNO is in ascending order within the like
values of LASTNAME.
The costs and benefits of RANDOM
Why would anyone want to keep index data in random order? Currently, online
transaction processing (OLTP) workloads in a data sharing environment can
experience contention on index pages, especially during
INSERTs, when an application program
INSERTs into indexes with columns that contain
current timestamps or ever-increasing values. These column types often
create insertion hot spots on index pages. Then applications must wait to
acquire busy index pages.
We can use randomized index key columns to reduce contention, but at what
cost? There may be more CPU usage and getpage
operations, as well as more index page read and write I/Os. The
RANDOM option is useful when ascending
insertions or hot spots cause this contention but the resulting cost is
not prohibitive.
Another possible issue is that while each distinct column value is stored randomly, like-kind values are contiguous. Therefore a randomized index column can relieve contention problems for sets of similar or sequential values, but it's no help with identical values. Identical values encode the same, and each is inserted at the same place on the index tree.
Here are some interesting facts (some beneficial and some not) about the
RANDOM column option.
It allows equality predicate lookups, such as
LASTNAME = :LN, but it does not support
matching range lookups, such as LASTNAME > :LN.
The option can be used in non matching index scans in a screening fashion
and as part of index-only access. Even though values are stored as random
encoded values, we can retrieve the original, decoded value of the column.
The option causes RUNSTATS to populate
HIGH2KEY and LOW2KEY
with the original, decoded value of the column. Finally, the
RANDOM column option cannot be specified in the
following cases:
- For an index key column that is of variable length
- If the index is created with the NOT PADDED option
- As part of the GENERATE KEY USING clause
- If the RANDOM column is used to determine the partition location of a table row
Also, DB2 cannot use random order index columns as part of a sort merge join. If a join is between one table that has an ascending index on the join column and a second table that has a randomized index column, the indexes are not in the same order and cannot be merged.
Sort avoidance with RANDOM columns
Now, what happens if the index that would have been ideal for DB2 to use to avoid a data sort is not an ordered list, but rather random?
For the examples that follow, we will use our three-column index on DEPTNO, LASTNAME RANDOM, EMPNO to determine how DB2 might (or might not) use an index for sort avoidance.
1)SELECT DEPTNO, LASTNAME, … FROM BIGTABLE
WHERE DEPTNO in ('A01', 'B01')
ORDER BY DEPTNO
For this ORDER BY, DB2 has no problem with the
RANDOM column. The
RANDOM column is not part of our
ORDER BY. We can match on one column, and if we
use the index the traditional way, the data will come back in
DEPTNO order.
A01 Yothers … A01 Josten … A01 Josten … A01 Miller … A01 Miller … A01 Miller … B01 Zagelow … B01 Bossman … B01 Bossman … |
2)SELECT EMPNO, … FROM BIGTABLE
WHERE DEPTNO = 'A01'
AND LASTNAME = 'MILLER'
ORDER BY EMPNO
Again, there is no problem here with the RANDOM
column. Because like-kind values of the RANDOM
column are stored together, DB2 can do an equal lookup, matching on two
columns of our index. And, if DB2 uses the index the traditional way, the
data will come back in EMPNO order.
14567 …
14568 …
14569 …
|
3)SELECT LASTNAME, EMPNO, … FROM BIGTABLE
WHERE DEPTNO = 'A01'
ORDER BY LASTNAME, EMPNO
Here DB2 can match on only one column, and since
LASTNAME is random, DB2 must do a data sort to
bring our rows back in LASTNAME,
EMPNO order.
Josten 21234 … Josten 21235 … Miller 14567 … Miller 14568 … Miller 14569 … Yothers 12345 … |
4)SELECT LASTNAME, EMPNO, … FROM BIGTABLE
WHERE DEPTNO = 'A01'
AND LASTNAME BETWEEN 'MILLER' AND 'YOTHERS'
ORDER BY LASTNAME, EMPNO
In this example, DB2 can match only on DEPTNO
and must do an index scan, screening on
LASTNAME within that one value of
DEPTNO to find all of the index entries where
LASTNAME is between our low value and our high
value. The index rows will not be in LASTNAME
order within DEPTNO A01. Therefore, DB2 must
sort to satisfy the ORDER BY.
Miller 14567 … Miller 14568 … Miller 14569 … Yothers 12345 … |
5)SELECT LASTNAME
FROM BIGTABLE
WHERE DEPTNO = 'A01'
In this example, there is no ORDER BY. In what
order will our rows be returned? Since both
LASTNAME and DEPTNO
are part of the index, we will get index-only access. We have one matching
predicate on DEPTNO. Since DB2 will only use
the index the “traditional” way for index-only access, the rows will be
returned just like our index data, in groups of identical
LASTNAMEs in random order.
Yothers Josten Josten Miller Miller Miller |
6)SELECT DEPTNO, LASTNAME, COUNT(*)
FROM BIGTABLE
GROUP BY DEPTNO, LASTNAME
In this example, DB2 will not have to sort. Our index is in
DEPTNO order, and within that order, in groups
of identical (but unordered) values of
LASTNAME. Therefore, using control break logic,
DB2 can return distinct DEPTNO,
LASTNAME combinations without sorting.
The rows will be in DEPTNO order but will not be in
LASTNAME order within each
DEPTNO. The answer is accurate because we did
not ask that the rows be returned in ORDER,
only GROUPed.
A01 Yothers 1 A01 Josten 2 A01 Miller 3 B01 Zagelow 1 B01 Bossman 2 |
Because LASTNAME is random in the index, if we
want our rows to be in LASTNAME order within
DEPTNO, we must add an
ORDER BY clause. This requirement to add an
ORDER BY clause in addition to the
GROUP BY clause is new with the
RANDOM option. Before DB2 9, we needed to add
an ORDER BY only if we had more than one index
that could be chosen to do the grouping. Remember, we do not have an
ASC or DESC option
on GROUP BY. Therefore, if we have an
INDEX on COL1 ASC,
COL2 DESC, as well as an index on
COL1 DESC,
COL2 DESC, either index can be used to avoid
the GROUP BY sort, and the one chosen will
determine the order of the output rows. To avoid unpredictable output in
this situation, we would need to code the
ORDER BY clause (even before DB2 9).
Consider the following realistic example of a situation where the
RANDOM option might be preferable.
A user defines an index on the INVOICE_NUMBER
column of the INVOICE_MASTER table in
ASCENDING order. The user then inserts rows
with an ascending sequence of values for the
INVOICE_NUMBER column (0000100, 0000200,
0000300, ...). The index rows will be inserted at the end of the index,
creating a hot spot. In this particular index, the user looks up specific
invoices only by their INVOICE_NUMBER—in other
words, only equality predicates are applied to the
INVOICE_NUMBER column. To reduce contention,
DROP and re-CREATE
the index using the RANDOM option.
With the RANDOM specification, the unique
INVOICE_NUMBER values will now be stored
randomly throughout the index, preventing the contention that results from
always inserting values at the end. Because the common use of this index
is for looking up specific invoices by
INVOICE_NUMBER, the new
RANDOM ordering option provides a good
solution.
We must be careful with the RANDOM option and
ensure that our solution is not creating more cost from increased
CPU, I/O, and sort overhead. The option is best used when we are actually
solving a contention problem, when our predicates on the random column are
EQUAL predicates or, if not
EQUAL, the random columns do not often appear
in ORDER BY clauses. The benefit must outweigh
the additional cost or the RANDOM option will
cause more problems than it solves. And, once we begin using the
RANDOM option, we must be very careful to code
both a GROUP BY and an
ORDER BY when order matters.
- Read more articles in
IBM Data Management Magazine.
Bonnie Baker specializes in teaching on-site classes for corporations, agencies, and DB2 user groups. She is an IBM DB2 Gold Consultant, an IBM Information Champion, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her ability to demystify complex concepts through analogies and war stories.




