Interpolation formulas

For a predicate that uses a range of values, DB2® calculates the filter factor by an interpolation formula.

Begin program-specific programming interface information.
The formula is based on an estimate of the ratio of the number of values in the range to the number of values in the entire column of the table.

The formulas

The formulas that follow are rough estimates, which are subject to further modification by DB2. They apply to a predicate of the form col op. constant. The value of (Total Entries) in each formula is estimated from the values in columns HIGH2KEY and LOW2KEY in catalog table SYSIBM.SYSCOLUMNS for column col: Total Entries = (HIGH2KEY value - LOW2KEY value).

  • For the operators < and <=, where the constant is not a host variable:

    (constant value - LOW2KEY value) / (Total Entries)

  • For the operators > and >=, where the constant is not a host variable:

    (HIGH2KEY value - constant value) / (Total Entries)

  • For LIKE or BETWEEN:

    (High constant value - Low constant value) / (Total Entries)

Example

For column C2 in a predicate, suppose that the value of HIGH2KEY is 1400 and the value of LOW2KEY is 200. For C2, DB2 calculates Total Entries = 1400 - 200, or 1200.

For the predicate C1 BETWEEN 800 AND 1100, DB2 calculates the filter factor F as:

F = (1100 - 800)/1200 = 1/4 = 0.25

Interpolation for LIKE

DB2 treats a LIKE predicate as a type of BETWEEN predicate. Two values that bound the range qualified by the predicate are generated from the constant string in the predicate. Only the leading characters found before the escape character ('%' or '_') are used to generate the bounds. So if the escape character is the first character of the string, the filter factor is estimated as 1, and the predicate is estimated to reject no rows.

Defaults for interpolation

DB2 might not interpolate in some cases; instead, it can use a default filter factor. Defaults for interpolation are:

  • Relevant only for ranges, including LIKE and BETWEEN predicates
  • Used only when interpolation is not adequate
  • Based on the value of COLCARDF
  • Used whether uniform or additional distribution statistics exist on the column if either of the following conditions is met:
    • The predicate does not contain constants
    • COLCARDF < 4.

The following table shows interpolation defaults for the operators <, <=, >, >= and for LIKE and BETWEEN.

Table 1. Default filter factors for interpolation
COLCARDF Factor for Op1 Factor for LIKE or BETWEEN
>=100000000 1/10,000 3/100000
>=10000000 1/3,000 1/10000
>=1000000 1/1,000 3/10000
>=100000 1/300 1/1000
>=10000 1/100 3/1000
>=1000 1/30 1/100
>=100 1/10 3/100
>=2 1/3 1/10
=1 1/1 1/1
<=0 1/3 1/10
 
Note:
  1. Op is one of these operators: <, <=, >, >=.

End program-specific programming interface information.