Minimizing the cost of retrieving few rows
You can improve the performance of applications that retrieve only a small subset of many qualifying rows.
About this task
When an application issues a SELECT statement, Db2 assumes that the application retrieves all the qualifying rows. This assumption is most appropriate for batch environments. However, for interactive SQL applications, such as SPUFI, queries commonly define a large potential result set but retrieve only the first few rows.
- Retrieve only a subset of the result set
- Give priority to the retrieval of the first few rows
Db2 uses the OPTIMIZE FOR n ROWS clause to choose access paths that minimize the response time for retrieving the first few rows. For distributed queries, the value of n determines the number of rows that Db2 sends to the client on each DRDA network transmission.
The OPTIMIZE FOR n ROWS clause does the retrieval of all the qualifying rows. However, if you use OPTIMIZE FOR n ROWS, the total elapsed time to retrieve all the qualifying rows might be greater than when Db2 optimizes for the entire result set.
OPTIMIZE FOR n ROWS is effective only on queries that can be processed incrementally. If the query causes Db2 to gather the entire result set before returning the first row, Db2 ignores the OPTIMIZE FOR n ROWS clause. Examples include the following situations:
- The query uses SELECT DISTINCT or a set function distinct, such as COUNT(DISTINCT C1).
- Either GROUP BY or ORDER BY is used, and no index can provide the necessary ordering.
- An aggregate function is used and no GROUP BY clause is used.
- The query uses UNION.
Procedure
To optimize applications the retrieve few of a large qualifying set of rows, use the following approaches:
Results
- The join method might change. Nested loop join is the most likely choice, because it has a lower cost for the retrieval of only one row.
- An index that matches the ORDER BY clause is more likely to be picked because no sort would be needed for the ORDER BY.
- List prefetch is less likely to be picked.
- Sequential prefetch is less likely to be requested by Db2 because it infers that you want to retrieve only few rows.
- In a join query, the table with the columns in the ORDER BY clause is likely to be picked as the outer table if an index created on that outer table gives the ordering needed for the ORDER BY clause.
Example
Suppose that you query the employee table regularly to determine the employees with the highest salaries. You might use the following query:
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC;
Suppose that an index is defined on column EMPNO, so employee records are ordered by EMPNO. If descending index also exists on the SALARY column, that index is likely to be poorly clustered. To avoid many random synchronous I/O operations, Db2 is likely to use a table space scan, then sort the rows on SALARY. The sort operation is likely to delay the return of the first few rows to the application. However, you might use the following statement to avoid the cost of the sort operation:
SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY
FROM EMP
ORDER BY SALARY DESC
OPTIMIZE FOR 20 ROWS;
When you use this statement, Db2 probably uses the SALARY index directly. The query now indicates that you expect to retrieve the salaries of only the 20 most highly paid employees.