Relational index planning tips

A well-designed index can make it easier for queries to access relational data.

Use the Design Advisor (db2advis command) to find the best indexes for a specific query or for the set of queries that defines a workload. This tool can make performance-enhancing recommendations, such as include columns or indexes that are enabled for reverse scans.

The following guidelines can also help you to create useful relational indexes.
  • Retrieving data efficiently
    • To improve data retrieval, add include columns to unique indexes. Good candidates are columns that:
      • Are accessed frequently and would benefit from index-only access
      • Are not required to limit the range of index scans
      • Do not affect the ordering or uniqueness of the index key
      For example:
         create unique index idx on employee (workdept) include (lastname)
      Specifying LASTNAME as an include column rather than part of the index key means that LASTNAME is stored only on the leaf pages of the index.
    • Create relational indexes on columns that are used in the WHERE clauses of frequently run queries.
      In the following example, the WHERE clause will likely benefit from an index on WORKDEPT, unless the WORKDEPT column contains many duplicate values.
         where workdept='A01' or workdept='E21'
    • Create relational indexes with a compound key that names each column referenced in a query. When an index is specified in this way, relational data can be retrieved from the index only, which is more efficient than accessing the table.
      For example, consider the following query:
         select lastname
           from employee
           where workdept in ('A00','D11','D21')
      If a relational index is defined on the WORKDEPT and LASTNAME columns of the EMPLOYEE table, the query might be processed more efficiently by scanning the index rather than the entire table. Because the predicate references WORKDEPT, this column should be the first key column of the relational index.
  • Searching tables efficiently

    Decide between ascending and descending key order, depending on the order that will be used most often. Although values can be searched in reverse direction if you specify the ALLOW REVERSE SCANS option on the CREATE INDEX statement, scans in the specified index order perform slightly better than reverse scans.

  • Accessing larger tables efficiently
    Use relational indexes to optimize frequent queries against tables with more than a few data pages, as recorded in the NPAGES column of the SYSCAT.TABLES catalog view. You should:
    • Create an index on any column that you will use to join tables.
    • Create an index on any column that you will be searching for specific values on a regular basis.
  • Improving the performance of update or delete operations
    • To improve the performance of such operations against a parent table, create relational indexes on foreign keys.
    • To improve the performance of such operations against REFRESH IMMEDIATE and INCREMENTAL materialized query tables (MQTs), create unique relational indexes on the implied unique key of the MQT, which is composed of the columns in the GROUP BY clause of the MQT definition.
  • Improving join performance

    If you have more than one choice for the first key column in a multiple-column relational index, use the column that is most often specified with an equijoin predicate (expression1 = expression2) or the column with the greatest number of distinct values as the first key column.

  • Sorting
    • For fast sort operations, create relational indexes on columns that are frequently used to sort the relational data.
    • To avoid some sorts, use the CREATE INDEX statement to define primary keys and unique keys whenever possible.
    • Create a relational index to order the rows in whatever sequence is required by a frequently run query. Ordering is required by the DISTINCT, GROUP BY, and ORDER BY clauses.
      The following example uses the DISTINCT clause:
         select distinct workdept
           from employee
      The database manager can use an index that is defined on the WORKDEPT column to eliminate duplicate values. The same index could also be used to group values, as in the following example that uses a GROUP BY clause:
         select workdept, average(salary)
           from employee
           group by workdept
  • Keeping newly inserted rows clustered and avoiding page splits

    Define a clustering index, which should significantly reduce the need to reorganize the table. Use the PCTFREE option on the CREATE TABLE statement to specify how much free space should be left on each page so that rows can be inserted appropriately. You can also specify the pagefreespace file type modifier on the LOAD command.

  • Saving index maintenance costs and storage space
    • Avoid creating indexes that are partial keys of other existing indexes. For example, if there is an index on columns A, B, and C, another index on columns A and B is generally not useful.
    • Do not create arbitrary indexes on many columns. Unnecessary indexes not only waste space, but also cause lengthy prepare times.
      • For online transaction processing (OLTP) environments, create one or two indexes per table.
      • For read-only query environments, you might create more than five indexes per table.
        Note: For workloads involving many ad-hoc queries with many different predicates where index gap cardinality is small and the selectivity of non index gaps after the index gap is small, it is likely more appropriate to create a few large composite indexes for a table, as opposed to many smaller indexes.
      • For mixed query and OLTP environments, between two and five indexes per table is likely appropriate.
  • Enabling online index defragmentation

    Use the MINPCTUSED option when you create relational indexes. MINPCTUSED enables online index defragmentation; it specifies the minimum amount of space that must be in use on an index leaf page.