Minimizing the cost of retrieving few rows

You can improve the performance of applications that retrieve only a small subset of many qualifying rows.

About this task

Begin program-specific programming interface information.When an application issues a SELECT statement, Db2 assumes that the application retrieves all the qualifying rows. This assumption is most appropriate for batch environments. However, for interactive SQL applications, such as SPUFI, queries commonly define a large potential result set but retrieve only the first few rows.

The OPTIMIZE FOR n ROWS clause declares the intent of an application to take one of the following actions:
  • Retrieve only a subset of the result set
  • Give priority to the retrieval of the first few rows

Db2 uses the OPTIMIZE FOR n ROWS clause to choose access paths that minimize the response time for retrieving the first few rows. For distributed queries, the value of n determines the number of rows that Db2 sends to the client on each DRDA network transmission.

The OPTIMIZE FOR n ROWS clause does the retrieval of all the qualifying rows. However, if you use OPTIMIZE FOR n ROWS, the total elapsed time to retrieve all the qualifying rows might be greater than when Db2 optimizes for the entire result set.

OPTIMIZE FOR n ROWS is effective only on queries that can be processed incrementally. If the query causes Db2 to gather the entire result set before returning the first row, Db2 ignores the OPTIMIZE FOR n ROWS clause. Examples include the following situations:

  • The query uses SELECT DISTINCT or a set function distinct, such as COUNT(DISTINCT C1).
  • Either GROUP BY or ORDER BY is used, and no index can provide the necessary ordering.
  • An aggregate function is used and no GROUP BY clause is used.
  • The query uses UNION.

Procedure

To optimize applications the retrieve few of a large qualifying set of rows, use the following approaches:

  • To avoid sort operations, specify OPTIMIZE FOR 1 ROW.
    OPTIMIZE FOR 1 ROW tells Db2 to select an access path that returns the first qualifying row quickly. The result is that Db2 avoids a sort whenever possible. When you specify any value for n other than 1, Db2 chooses an access path based on cost, and sort operations remain a possible.

    You can use OPTIMIZE FOR 1 ROW for both local and remote queries. This value does not prevent or restrict block fetch for distributed queries.

    If you continue to encounter sort operations when OPTIMIZE FOR 1 ROW is specified, you can set the value of the OPT1ROWBLOCKSORT subsystem parameter to ENABLE.

  • For local queries, specify OPTIMIZE FOR n ROWS only in applications that frequently fetch only a small percentage of the total rows in the query result set.
    For example, an application might read only enough rows to fill a single terminal screen. In such, the application might read the remaining part of the query result set only rarely. For such applications, OPTIMIZE FOR n ROWS can result in better performance by causing Db2 to favor SQL access paths that deliver the first n rows as fast as possible.
  • For remote queries, specify an appropriate value of n for the specific situation:
    • Specify a small value for n to limit the number of rows that flow across the network on any single transmission.
    • Specify a large value for n to improve the performance for receiving a large result set. When you specify a large value, Db2 attempts to send the n rows in multiple transmissions. For better performance when retrieving a large result set, do not issue other SQL statements until the entire result set for the query is processed. If retrieval of data for several queries overlaps, Db2 might need to buffer result set data in the DDF address space (ssnmDIST).
  • For a Call Level Interface (CLI) application, you can specify that Db2 uses OPTIMIZE FOR n ROWS for all queries. To do that, specify the OPTIMIZEFORNROWS keyword in the initialization file.

Results

The following access path changes are likely when you specify the OPTIMIZE FOR n ROWS clause:
  • The join method might change. Nested loop join is the most likely choice, because it has a lower cost for the retrieval of only one row.
  • An index that matches the ORDER BY clause is more likely to be picked because no sort would be needed for the ORDER BY.
  • List prefetch is less likely to be picked.
  • Sequential prefetch is less likely to be requested by Db2 because it infers that you want to retrieve only few rows.
  • In a join query, the table with the columns in the ORDER BY clause is likely to be picked as the outer table if an index created on that outer table gives the ordering needed for the ORDER BY clause.End program-specific programming interface information.

Example

Suppose that you query the employee table regularly to determine the employees with the highest salaries. You might use the following query:

SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
  FROM EMP
  ORDER BY SALARY DESC;

Suppose that an index is defined on column EMPNO, so employee records are ordered by EMPNO. If descending index also exists on the SALARY column, that index is likely to be poorly clustered. To avoid many random synchronous I/O operations, Db2 is likely to use a table space scan, then sort the rows on SALARY. The sort operation is likely to delay the return of the first few rows to the application. However, you might use the following statement to avoid the cost of the sort operation:

SELECT LASTNAME,FIRSTNAME,EMPNO,SALARY
  FROM EMP
  ORDER BY SALARY DESC
  OPTIMIZE FOR 20 ROWS;

When you use this statement, Db2 probably uses the SALARY index directly. The query now indicates that you expect to retrieve the salaries of only the 20 most highly paid employees.