Join cost estimation and index selection

As the query optimizer compares the various possible access choices, it must assign a numeric cost value to each candidate. The optimizer uses that value to determine the implementation which consumes the least amount of processing time. This costing value is a combination of CPU and I/O time

In steps 3 and 5 in Join order optimization, the optimizer estimates cost and chooses an access method for a given dial combination. The choices made are like the choices for row selection, except that a plan using a probe must be chosen.

The costing value is based on the following assumptions:
  • Table pages and index pages must be retrieved from auxiliary storage. For example, the query optimizer is not aware that an entire table might be loaded into active memory as the result of a Set Object Access (SETOBJACC) CL command. Use of this command could significantly improve the performance of a query. However, the optimizer does not change the query implementation to take advantage of the memory resident state of the table.
  • The query is the only process running on the system. No allowance is given for system CPU utilization or I/O waits which occur because of other processes using the same resources. CPU-related costs are scaled to the relative processing speed of the system running the query.
  • The values in a column are uniformly distributed across the table. For example, if 10% of the table rows have the same value, then on average, every 10th row in the table contains that value.
  • The column values are independent from any other column values in a row, unless there is an index available whose key definition is (A, B). Multi-key field indexes allow the optimizer to detect when the values between columns are correlated.

    For example, a column named A has a value of 1 in 50% of the rows in a table. A column named B has a value of 2 in 50% of the rows. It is expected that a query which selects rows where A = 1, and B = 2 selects 25% of the rows in the table.

The main factors in the join cost calculation for secondary dials are:
  • the number of rows selected in all previous dials
  • the number of rows which match, on average, each of the rows selected from previous dials.
Both of these factors can be derived by estimating the number of matching rows for a given dial.
When the join operator is something other than equal, the expected number of matching rows is based on the following default filter factors:
  • 33% for less-than, greater-than, less-than-equal-to, or greater-than-equal-to
  • 90% for not equal
  • 25% for BETWEEN range (OPNQRYF %RANGE)
  • 10% for each IN list value (OPNQRYF %VALUES)

For example, when the join operator is less-than, the expected number of matching rows is 0.33 * (number of rows in the dial). If no join specifications are active for the current dial, the Cartesian product is assumed to be the operator. For Cartesian products, the number of matching rows is every row in the dial, unless local row selection can be applied to the index.

When the join operator is equal, the expected number of rows is the average number of duplicate rows for a given value.