Overriding predicate selectivities at the statement level

You can override the default selectivity estimates that Db2 uses to select access paths for SQL statements.

Before you begin

Begin program-specific programming interface information.The following prerequisites are met:
  • Prepare to manage access paths.
  • You have one of the following authorities:
    • SQLADM
    • SYSOPR
    • SYSCTRL
    • SYSADM
  • Instances of the following user tables exist under your schema, or under a separate schema for input tables:
    • DSN_USERQUERY_TABLE
    • DSN_PREDICAT_TABLE
    • DSN_PREDICATE_SELECTIVITY
    For more information about using tables under a separate schema, see Creating input EXPLAIN tables under a separate schema. You can find sample CREATE statements for the tables and associated indexes in members DSNTESC and DSNTESH of the prefix.SDSNSAMP library.
  • The package that contains the statement was created by a BIND PACKAGE statement. Statement-level methods for influencing access paths are not supported for statements in packages that are created by other statements, such as CREATE FUNCTION, CREATE TRIGGER, and CREATE PROCEDURE statements.

About this task

Db2 cannot estimate filter factors for certain types of predicates. It must rely on inaccurate estimates or even default filter factors to select access paths for statements that use such predicates. Examples include predicates that involve the following items, among others:

  • Host variables
  • Parameter markers
  • Expressions
  • Table self-joins
  • Subqueries

However, you can override these default filter factors for certain predicates by creating selectivity overrides. Each predicate selectivity override describes the selectivity of a particular predicate in a particular SQL statement. When a statement contains more than one predicate, you can create separate selectivity overrides for each predicate in the statement.

A statement that is issued many times might have different filtering characteristics at different times. A predicate that filters many rows with one literal value might filter far fewer rows when the literal value is different. Therefore, a single set of overrides for a statement might not adequately describe the filtering of the predicates across all executions. So, you can create more than one set of overrides for each statement. Each set of overrides is a selectivity instance. Each selectivity instance has a weight value. The weight value represents the percentage of executions of the statement in which you expect that the predicates to have the selectivities of that instance.

The set of all selectivity instances for a single statement is called the selectivity profile of the statement. The selectivity profile describes the selectivities of the predicates in the SQL statement, over all instances of the statement. However, the set of overrides for a particular statement does not need to completely describe the selectivity of every predicate for every instance of the statement. It is best to create overrides only when default selectivities for particular predicates result in inefficient access paths and a correction is needed. By creating overrides only when they are needed, you can also minimize the chances that unneeded overrides can become stale and cause performance regressions. Like data statistics, selectivity overrides can also become stale over time as the data changes.

Restriction: Start of changeDo not override predicate selectivity for access paths that use predicate push-down. Db2 issues SQLCODE +395 for such cases. The following PLAN_TABLE values usually indicate predicate pushdown cases:
  • ACCESSTYPE='O'
  • TABLE_TYPE='W'
  • METHOD=1
End of change

Procedure

To create statement-level selectivity overrides:

  1. Populate DSN_PREDICAT_TABLE and DSN_PREDICATE_SELECTIVITY tables with values for the predicate selectivity that you want to enforce.
    You can use either the tables that are qualified by your schema, or create more instances under a different schema. When you issue the BIND QUERY command, you can specify the input table schema to be used. By creating the extra tables, you can simplify the process of ensuring that only the needed rows are present in the input tables.
    To populate the input tables:
    1. Capture EXPLAIN information.
      DSN_PREDICAT_TABLE and DSN_PREDICATE_SELECTIVITY are populated with model rows that you can use to create selectivity overrides.
    2. Optional: Select the relevant rows from the EXPLAIN output and insert them into tables of a separate schema.
      You can specify this schema value later when you issue the BIND QUERY command.
    3. Modify the DSN_PREDICATE_SELECTIVITY table to contain the details of the override.
      You can create a single or multiple selectivity instances for a predicate. You can create single instances by modifying the following values in the existing rows of the table. If you want to create multiple selectivity instances, you must insert new rows for the additional instances:
      INSTANCE
      Specify a value greater than 0 to identify the selectivity instance. Each instance can contain multiple predicates, but you cannot duplicate the same predicate within a single instance. By using the same value in several rows, you can group related predicates. You do not have to override every predicate for a statement. Similarly, when multiple instances exist, they can contain different numbers of rows. One instance might define overrides for several predicates, while another instance might override only a single important predicate.

      You can specify as many as 20 different instance values, if the sum of weight values for all instances equals 1.0. If the sum of weight values is less than 1.0, you can specify only 19 instance values. The INSTANCE values do not have to be consecutive.

      SELECTIVITY
      Specify the selectivity estimate for the predicate.
      WEIGHT
      Specify a value 0–1 to indicate the weight of the selectivity instance. This WEIGHT value must match for every row that contains the same INSTANCE value. The sum of the WEIGHT values from multiple instances must not exceed 1. The recommendation is to weight each instance so that the sum of the weights is equal to 1. However, when the sum of the weights for multiple instances is less than 1, Db2 assumes the existence of another instance. The assumed instance uses the default selectivity assumption, and a weight of 1 minus the sum of the weights for the specified overrides.
      ASSUMPTION
      Specify 'OVERRIDE' to indicate each row that specifies an override. Rows that are not used as selectivity overrides must contain 'NORMAL', or another value. The recommendation is to create overrides only for default selectivities that require some sort of correction.
  2. INSERT rows into the DSN_USERQUERY_TABLE table.
    1. Insert values in the following columns to specify the SQL statement that the override applies to, and the scope of the override:
      QUERYNO
      Insert the value that correlates to the value of the QUERYNO column of the related DSN_PREDICAT_TABLE and DSN_PREDICATE_SELECTIVITY table rows that describe the selectivity overrides.
      QUERY_TEXT
      Insert the text of the statement whose access path you want to influence.

      The text that you provide must match the statement text that Db2 uses when binding static SQL statements and preparing dynamic SQL statements. For more information about how to enable successful text matching, see Populating query text for statement-level matching.

      SCHEMA
      If the SQL statement contains unqualified object names that might resolve to different default schemas, insert the schema name that identifies the unqualified database objects. If the statement contains unqualified objects names because it might apply to different schemas at different times, you must create separate hints or overrides for each possible SCHEMA value. If the statement contains only fully qualified object names, the SCHEMA value is not required. However, you can still insert a SCHEMA value to help you identify that the hint relates to a certain schema.
      HINT_SCOPE
      Insert a value to specify that context in which to match the statement.
      0
      System wide. Db2 uses only the text of the SQL statement and the value of the SCHEMA column, when it contains a value, to determine whether the statement matches.
      1
      Package-level. Db2 uses the values of the COLLECTION, PACKAGE, and VERSION columns to determine whether the statement matches.
      COLLECTION
      Insert the collection ID of the package. This value is required only when the value of HINT_SCOPE is 1.

      When the value of HINT_SCOPE is 0, the value is optional, and when a value is specified Db2 issues an error message when you bind the query if the matching value is not found in the SYSIBM.SYSPACKAGE catalog table. When HINT_SCOPE is 0, either specify both COLLECTION and PACKAGE or leave both fields blank.

      For static SQL statements and dynamic SQL statements that use the DYNAMICRULES(BIND) option, you might need to specify the value of this column so that Db2 can retrieve the correct application default values from the SYSIBM.SYSPACKSTMT catalog table.

      PACKAGE
      Insert the name of the package. This value is required only when the value of HINT_SCOPE is 1.

      When the value of HINT_SCOPE is 0, the value is optional, and when a value is specified Db2 issues an error message when you bind the query if the matching value is not found in the SYSIBM.SYSPACKAGE catalog table. When HINT_SCOPE is 0, either specify both COLLECTION and PACKAGE or leave both fields blank.

      For static SQL statements and dynamic SQL statements that use the DYNAMICRULES(BIND) option, you might need to specify the value of this column so that Db2 can retrieve the correct application default values from the SYSIBM.SYSPACKSTMT catalog table.

      The package-specific scope is intended primarily to support the staging, validation, and testing of statement-level hints, before they are deployed with a system-wide scope.

      SELECTVTY_OVERRIDE
      Set the value of this column to 'Y' to indicate that the row creates a selectivity override.
      ACCESSPATH_HINT
      Set the value to 'N' to specify that the row does not create a statement-level access path. Access paths and selectivity overrides cannot be specified for the same statements.
      OPTION_OVERRIDE
      Insert a value to specify whether to create statement-level optimization parameters. Unless you specifically want to enable both option overrides and selectivity overrides for the same statement, specify 'N' in this column. Specify 'Y' in this column to create option overrides.

      For more information about other values to specify for creating option overrides, see the following topic: Specifying optimization parameters at the statement level.

  3. Issue a BIND QUERY command.
    You must omit the LOOKUP option or specify LOOKUP(NO).
    If you created input tables under a dedicated schema, specify the EXPLAININPUTSCHEMA bind option and the name of the schema.
    Db2 takes the input from every DSN_USERQUERY_TABLE row, and from related input tables, and inserts data into the following catalog tables:
    • SYSIBM.SYSQUERY
    • SYSIBM.SYSQUERYPREDICATE
    • SYSIBM.SYSQUERYSEL

    The QUERYID column correlates rows in these tables.

Example

The following table shows a subset of columns values for a set of rows in the DSN_PREDICATE_SELECTIVITY table. It illustrates a selectivity profile that is made up of two selectivity instances. Each row corresponds to a single selectivity override.

Table 1. Sample values from DSN_PREDICATE_SELECTIVITY table
INSTANCE WEIGHT PREDNO SELECTIVITY
1 0.45 1 0.36
1 0.45 3 0.15
1 0.45 7 0.15
2 0.25 1 0.99
2 0.25 3 0.20
2 0.25 7 0.001

The six selectivity overrides apply to three different predicates in the statement. The overrides are grouped into two selectivity instances. The rows that contain INSTANCE=1 describe the selectivities that are expected 45% of the times that the statement is issued. The rows that contain INSTANCE=2 describe the selectivities that are expected 25% of the time. Notice that the sum of the weight values for the two instances is only 0.70. That value means that the specified overrides represent only 70% of the expected selectivities. If only these overrides are defined for the statement, Db2 assumes that a third selectivity instance exists. The implied selectivity instance has a weight of 0.30. Db2 uses the estimated filter factors for that selectivity instance, as if no override was specified for that instance.

When Db2 uses multiple selectivity instances to optimize a statement, the weighted averages of the selectivity instances are shown in the EXPLAIN cost estimates. The values that are reported in DSN_PREDICAT_TABLE also reflect the weighted-average filtering.

As the PREDNO values suggest, overrides were not created for all predicates in the SQL statement. It is best to create overrides only for predicates that require some sort of correction. It is best to override only predicates that are difficult to estimate. Otherwise, the selectivity profile for a statement is likely to become inaccurate and require maintenance.

For example, Consider the following statement:

SELECT * FROM T1
WHERE T1.C1 > ?
AND T1.ACCT_NUM = ?
AND T1.SALES_DT BETWEEN ? AND ?

The RANGE predicate (T1.C1 > ?) and BETWEEN predicate (T1.SALES_DT BETWEEN ? AND ?) have unreliable default selectivity estimates. Because Db2 does not know the literal values that are used, the default estimates are guesses.

However, the ACCT_NUM = ? predicate is likely to have a high COLCARDF value, and the selectivity estimate is likely to be reliable.

For example, assume that the T1 table contains 1,000,000 rows and the COLCARDF value of the ACCT_NUM column is 250,000. In that case, an average four rows are returned for the ACCT_NUM = ? predicate. You might create a selectivity override for the ACCT_NUM = ? predicate that specifies a filter factor of 1/250,000, but it is best not to.

If the T1 table doubles in size over time because of new accounts, the CARDF value changes to 2,000,000. The COLCARDF value for the ACCT_NUM column also changes to 500,000. The actual selectivity of the predicate changes to 1/500,000 and any previously created selectivity override becomes inaccurate. You want the selectivity estimate for the ACCT_NUM = ? predicate to change as the number of rows in the table cardinality of the column increase.

By contrast, the range predicate selectivities benefit more from selectivity overrides because they do not often become more accurate as the column cardinality values change.

End program-specific programming interface information.

What to do next

Consider taking the following actions:

  1. Validate that the appropriate catalog table rows have been created:
    1. Insert row into the DSN_USERQUERY_TABLE table that contain values in the QUERY_TEXT and SCHEMA columns.
    2. Issue the following command:
      BIND QUERY LOOKUP(YES)
      Db2 issues the following messages to indicate whether the catalog tables contain valid rows that correspond to the DSN_USERQUERY_TABLE rows.
      • A DSNT280I message for each DSN_USERQUERY_TABLE row that has matching rows in the catalog tables.
      • A DSNT281I message for each DSN_USERQUERY_TABLE row that does not have matching rows in the catalog table.
      • A single DSNT290I message if some matching rows were found in the catalog tables or a DSNT291I message if no matching rows were found.
      Db2 also updates the value of QUERYID column in the DSN_USERQUERY_TABLE table to match the value from the matching rows in the SYSIBM.SYSQUERY catalog table.
  2. Delete the DSN_USERQUERY_TABLE rows to prevent the replacement of existing catalog table rows when you issue subsequent BIND QUERY commands. When you issue a BIND_QUERY command, catalog tables rows are created or replaced for every row in DSN_USERQUERY_TABLE row. Changes to data in other input tables might have unintended consequences if old rows remain in the DSN_USERQUERY_TABLE and you issue the BIND_QUERY command again.