Cost estimation
At runtime, the Db2 for i optimizer chooses an optimal access method for the query by calculating an implementation cost based on the current state of the tables referenced in the query and any access paths (indexes) which are available.
The optimizer models the access cost of each of the following:
- Reading rows directly from the table (dataspace scan processing)
- Reading rows through an access path (using either key selection or key positioning)
- Creating an access path directly from the dataspace
- Creating an access path from an existing access path (index-from-index)
- Using the query sort routine (if conditions are satisfied)
The cost of a particular method is the sum of:
- The start-up cost
- The cost associated with the given optimization mode. The OPTIMIZE
FOR n ROWS clause indicates to the optimizer the optimization goal
to be achieved. The optimizer can optimize SQL queries with one of
two goals:
- Minimize the time required to retrieve the first buffer of rows
from the table. This goal biases the optimization towards not creating
an index. Note: This is the default if you do not use OPTIMIZE FOR n ROWS.Either a data scan or an existing index is preferred. This mode can be specified by:
- The OPTIMIZE FOR n ROWS allowing the users to specify the number
of rows they expect to retrieve from the query.
The optimizer using this value to determine the percentage of rows that will be returned and optimizes accordingly. A small value instructs the optimizer to minimize the time required to retrieve the first n rows.
- The OPTIMIZE FOR n ROWS allowing the users to specify the number
of rows they expect to retrieve from the query.
- Minimize the time to process the whole query assuming that all
selected rows are returned to the application. This does not bias
the optimizer to any particular access method. Specify this mode by
using OPTIMIZE FOR n ROWS, which allows the users to specify the number
of rows they expect to retrieve from the query.
The optimizer uses this value to determine the percentage of rows that will be returned and optimizes accordingly. A value greater than or equal to the expected number of resulting rows instructs the optimizer to minimize the time required to run the entire query.
- Minimize the time required to retrieve the first buffer of rows
from the table. This goal biases the optimization towards not creating
an index.
- The cost of any access path creations.
- The cost of the expected number of page faults to read the rows
and the cost of processing the expected number of rows.
Page faults and number of rows processed may be predicted by statistics the optimizer obtains from the database objects, including:
- Table size
- Row size
- Index size
- Key size
A weighted measure of the expected number of rows to process. This is based on what the relational operators in the row selection predicates (default filter factors) are likely to retrieve:
- 10% for equal
- 33% for less-than, greater-than, less-than-equal-to, or greater-than-equal-to
- 90% for not equal
- 25% for BETWEEN range
- 10% for each IN list value
Key range estimate is a method that the optimizer uses to gain more accurate estimates of the number of expected rows that are selected from one or more selection predicates. The optimizer estimates by applying the selection predicates against the left-most keys of an existing index. The default filter factors then can be further refined by the estimate based on the key range. If the left-most keys in an index match columns that are used in row-selection predicates, use that index to estimate the number of keys that match the selection criteria. The estimate of the number of keys is based on the number of pages and key density of the machine index. It is performed without actually accessing the keys. Full indexes over columns that are used in selection predicates can significantly help optimization.