Contents


How to identify unused indexes in DB2 for Linux, UNIX, and Windows

Get to know different approaches for determining indexes never or rarely used by database queries

Comments

DB2 Design Advisor (db2advis)

The DB2 Design Advisor represents a simple method to identify unused indexes. You can use Design Advisor to analyze a set of SQL statements and their execution frequency. As a result of such an analysis, the Design Advisor returns the following information:

  • Additional index definitions that would improve performance of the queries considered by reducing the runtime of the queries)
  • Names of existing indexes that are used for query execution by DB2
  • Names of existing indexes that are not used for query execution by DB2

Let's look at a Design Advisor usage scenario based on the DB2 test database SAMPLE.

First, create the SAMPLE database by executing the db2sampl command on the command line as shown in Listing 1.

Listing 1. Create the SAMPLE database
db2sampl

Before calling the Design Advisor, you must run the EXPLAIN.DDL script to create the Explain tables that it requires. You can find the script in the sqllib/misc subdirectory of your DB2 installation. You only have to run the script once. Listing 2 shows the commands to run the script.

Listing 2. Create the Explain tables
db2 "CONNECT TO SAMPLE"
db2 -tf "C:\Program Files\IBM\SQLLIB\MISC\EXPLAIN.DDL"

When calling the Design Advisor, you have to define the set of SQL statements that you want to analyze. You can do this in several ways. One possibility is to provide a text file containing the SQL statements to analyze. For example, Listing 3 shows a sample file named queries.sql that contains a list of SQL statements.

Listing 3. DB2 Design Advisor input file queries.sql
-- EMPLOYEE queries

--#SET FREQUENCY 123
SELECT LASTNAME FROM EMPLOYEE WHERE EMPNO = '000010';
--#SET FREQUENCY 456
SELECT LASTNAME FROM EMPLOYEE WHERE WORKDEPT = 'A00';


-- DEPARTMENT queries

--#SET FREQUENCY 245
SELECT DEPTNAME FROM DEPARTMENT WHERE DEPTNO = 'A00';
--#SET FREQUENCY 678
SELECT DEPTNAME FROM DEPARTMENT WHERE MGRNO = '000010';


-- PROJECT queries

--#SET FREQUENCY 345
SELECT PROJNAME FROM PROJECT WHERE PROJNO = 'OP1000';

Following are some things to consider when creating a text file of SQL statements you want Design Advisor to analyze:

  • Comment lines start with -- and are ignored by the Design Advisor.
  • If known, you can optionally specify an estimate for the execution frequency of each query. This helps Design Advisor to better estimate how much database performance would be improved by creating new indexes. To do this, add a separate line starting with --#SET FREQUENCY just before the query definition (although these lines start with --, they are not treated as comment lines).
  • Each SQL statement in the text file must be terminated with a semicolon (;).
  • The SQL statements may contain parameter markers. Parameter markers are allowed because the Design Advisor does not actually execute the statements, it only evaluates alternative access plans for them.
  • For all tables and views not fully qualified (no explicit schema is given), the authorization ID of the caller is used as the default schema. Optionally, you can overwrite the default schema with the -q option when calling the Design Advisor.

Listing 4 shows a sample command you can use to call the Design Advisor and specify queries.sql as the input file that contains the SQL statements to analyze.

Listing 4. Call the DB2 Design Advisor with the input file queries.sql
db2advis -d sample -i queries.sql -m I -l -1 -t 0 -o db2advis_file.txt

The options on the above command have the following meanings:

  • -d Database name.
  • -i Input file containing the SQL statements to analyze.
  • -m Besides relational indexes, the Design Advisor can also recommend Materialized Query Tables (MQTs), Multidimensional Clustering (MDC) tables, and re-partitioning of partitioned tables. In this sample scenario, -m has a value of I to indicate that only relational indexes should be recommended (this is also the default).
  • -l Size limit for the definition of new indexes or MQTs. The value of -1 means no size limit.
  • -t Runtime limit for execution of the Design Advisor. The value of 0 means no time limit.
  • -o Output file where you want to write the recommendations of the Design Advisor.

The output file created by the Design Advisor consists of three sections: recommended additional indexes, recommended existing indexes, and unused indexes. For this scenario, you are primarily interested in the section that lists the unused indexes. Listing 5 shows a sample of the unused indexes section.

Listing 5. Unused indexes section in DB2 Design Advisor output
-- UNUSED EXISTING INDEXES
-- ============================
-- DROP INDEX "FECHNER "."XDEPT3";
-- DROP INDEX "FECHNER "."XPROJ2";
-- ===========================

For this scenario, the Design Advisor indicates that the indexes XDEPT3 and XPROJ2 are not used. However, keep in mind that this recommendation is based only on the set of SQL statements that the Design Advisor analyzed (the statements contained in the input file). Besides the SQL statements in the input file, there may be other statements that depend on the identified indexes for fast execution. Therefore, you should carefully check and consider any index identified as unused before you actually drop it.

As an alternative to an input file containing SQL statements, you can also have Design Advisor evaluate the contents of the database's SQL statement cache. You do this by using the -g option on the db2advis command, as shown in Listing 6.

Listing 6. Call the DB2 Design Advisor for the SQL statement cache contents
db2advis -d sample -g -m I -l -1 -t 0 -o db2advis_sql_cache.txt

Calling the Design Advisor this way requires less effort than using a manually created input file containing SQL statements. However, you still need to be aware that the results depend on the current contents of the SQL statement cache. Statements not found in the cache when the Design Advisor is called are not analyzed. Therefore, when using the SQL statement cache as input, you should run the Design Advisor at different points throughout the day to evaluate a broader range of statements.

db2pd utility

The db2pd utility is generally used for problem determination and monitoring purposes, but you can also use it to query information about database activity. One piece of information that you can obtain with the db2pd utility is the number of times indexes have been accessed since the database was activated. Listing 7 shows how to use the db2pd utility to get this information for all indexes on all tables in a database.

Listing 7. Query table and index metrics with the db2pd utility
db2pd -db sample -tcbstats all -file db2pd_tab_all.txt

The options on the above call to the db2pd utility have the following meanings:

  • -db Database name
  • -tcbstats all Show all table and index metrics
  • -file Output file

If you want to restrict the output of the db2pd utility to show metrics for only a certain table and its indexes, use the -tcbstats option to specify the tablespace ID and table ID. To do this, you first need to determine the tablespace ID and table ID of the table by executing a SELECT statement on the SYSCAT.TABLES catalog view, as shown in Listing 8.

Listing 8. Query database catalog to determine tablespace ID and table ID of a table
db2 "SELECT TBSPACEID, TABLEID
     FROM SYSCAT.TABLES
     WHERE TABSCHEMA = 'FECHNER' AND TABNAME = 'DEPARTMENT'"

Listing 9 shows a sample result set from a tablespace ID and table ID query such as the one above.

Listing 9. Result set of tablespace ID and table ID query
TBSPACEID TABLEID
--------- -------
        2       5

  1 record(s) selected.

After determining the tablespace ID and table ID of the table you want to query, you can restrict the db2pd output to just that table. You do this by including the tbspaceid and tableid suboptions following the -tcbstats option, as shown in Listing 10.

Listing 10. Restrict db2pd -tcbstats output to a specific table
db2pd -db sample -tcbstats all tbspaceid=2 tableid=5 -file db2pd_tab_dept.txt

Listing 11 shows a sample result set from a db2pd query such as the one above.

Listing 11. Table and index metrics as shown by the db2pd utility
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:20:34

TCB Table Information:
Address    TbspaceID TableID PartID MasterTbs MasterTab TableName          SchemaNm
ObjClass DataSize   LfSize     LobSize    XMLSize   
0x797DF2B8 2         5       n/a    2         5         DEPARTMENT         FECHNER
Perm     1          0          0          0         

TCB Table Stats:
Address    TableName          SchemaNm Scans      UDI        RTSUDI               PgReorgs
NoChgUpdts Reads      FscrUpdates Inserts    Updates    Deletes    OvFlReads  OvFlCrtes
RowsComp   RowsUncomp CCLogReads StoreBytes BytesSaved
0x797DF2B8 DEPARTMENT         FECHNER  0          0          0                    0
0          951        0           0          0          0          0          0
0          0          0          -          -         

TCB Index Information:
Address    InxTbspace ObjectID PartID TbspaceID TableID MasterTbs MasterTab
TableName          SchemaNm IID   IndexObjSize
0x797E0330 2          5        n/a    2         5       2         5
DEPARTMENT         FECHNER  3     8         
0x797E0330 2          5        n/a    2         5       2         5
DEPARTMENT         FECHNER  2     8         
0x797E0330 2          5        n/a    2         5       2         5
DEPARTMENT         FECHNER  1     8         

TCB Index Stats:
Address    TableName          IID   PartID EmpPgDel   RootSplits BndrySplts PseuEmptPg
EmPgMkdUsd Scans      IxOnlyScns KeyUpdates InclUpdats NonBndSpts PgAllocs   Merges
PseuDels   DelClean   IntNodSpl  
0x797E0330 DEPARTMENT         3     n/a    0          0          0          0
0          0          0          0          0          0          1          0
0          0          0          
0x797E0330 DEPARTMENT         2     n/a    0          0          0          0
0          678        0          0          0          0          1          0
0          0          0          
0x797E0330 DEPARTMENT         1     n/a    0          0          0          0
0          245        0          0          0          0          1          0
0          0          0

The part of the result set that concerns index usage is in the TCB Index Stats section. The sample output above shows that the DEPARTMENT table has three indexes. The indexes are shown by their index IDs (the IID column), not by their names. The Scans column shows how many times the indexes were accessed since the database was activated:

  • The index with ID 1 was accessed 245 times.
  • The index with ID 2 678 times.
  • The index with ID 3 was never accessed.

Because the index with ID 3 has not been accessed, you can conclude that it is not used, or at least that it is seldom used. To get the name of the index with ID 3, query the SYSCAT.INDEXES catalog view using a SELECT statement, as shown in Listing 12.

Listing 12. Query database catalog to determine an index name based on its index ID (IID)
db2 "SELECT INDSCHEMA, INDNAME
     FROM SYSCAT.INDEXES
     WHERE TABSCHEMA = 'FECHNER' AND TABNAME = 'DEPARTMENT' AND IID = 3"

The output of the above SELECT statement is shown in Listing 13.

Listing 13. Result set of index name query
INDSCHEMA                      INDNAME
------------------------------ ------------------------------
FECHNER                        XDEPT3

  1 record(s) selected.

In addition to the Scans column, the TCB Index Stats section of the db2pd utility results has a IxOnlyScns column that shows the number of index only scans. An Index only scan is an index access that is not followed by a table access because the index itself contains all the requested data. So if the IxOnlyScns counter is not zero, you have to add the numbers of both the Scans column and the IxOnlyScns column to get the total number of times the index has been accessed.

As already mentioned when discussing use of the Design Advisor, you have to be careful when interpreting index usage information. The same is true when using the db2pd approach. The fact that a certain index has not been used up until a certain point in time, does not necessarily mean that the index will never be used. So be cautious when deciding whether or not to drop an index that appears not to be used. If you do decide to drop an index, you should first save its CREATE INDEX statement so that you can easily recreate it if necessary.

DB2 Workload Manager (WLM)

The DB2 Design Advisor as well as the db2pd utility were introduced in DB2 Version 8 and continue to be available in DB2 Version 9. Starting with DB2 Version 9.5, you can also use the DB2 Workload Manager (WLM) to gathering index usage information. Generally, WLM is a priced DB2 feature that must be separately licensed. However, the extended database monitoring options that are part of WLM can be utilized without a separate license. This section does not provide an introduction to DB2 WLM. It does provide step-by-step instructions you can use to gather and evaluate index usage information with the help of WLM-specific event monitors and two Perl scripts that are part of every DB2 installation.

Like the Design Advisor, WLM requires that you have created the Explain tables before you can use it (see Listing 2 for details on running the EXPLAIN.DLL script to create the Explain tables). You must also first run the wlmevmon.ddl script that is in the sqllib/misc subdirectory. The script creates and enables the WLM-specific event monitors: DB2ACTIVITIES, DB2STATISTICS, and DB2THRESHOLDVIOLATIONS. These event monitors write the data they collect to tables. By default, these tables are created in tablespace USERSPACE1. If you want the tables to reside in another tablespace, alter the script accordingly before you run it. Listing 14 shows the command to run the wlmevmon.ddl script.

Listing 14. Create the WLM-specific event monitors
db2 -tf "C:\Program Files\IBM\SQLLIB\MISC\wlmevmon.ddl"

By default, the event monitors are created with the AUTOSTART option, which means that the event monitors are automatically started the next time the database is activated. Before you can use the event monitors after you first create them, you either have to deactivate and activate the database or manually start the event monitors, as shown in Listing 15.

Listing 15. Start the WLM-specific event monitors
db2 "SET EVENT MONITOR DB2ACTIVITIES STATE 1"
db2 "SET EVENT MONITOR DB2STATISTICS STATE 1"
db2 "SET EVENT MONITOR DB2THRESHOLDVIOLATIONS STATE 1"

To understand the next command, you need a basic knowledge of the WLM concept of service class. WLM allows you to assign workloads with different response time requirements to different service classes and service subclasses. Beginning with DB2 Version 9.5, the usage of service classes is deeply integrated into the DB2 engine. Even if WLM is not explicitly used (that is, no service classes are defined by the database administrator), all user transactions are executed within the context of the predefined service class. The predefined service class is SYSDEFAULTUSERCLASS and its corresponding subclass is SYSDEFAULTSUBCLASS. WLM-specific event monitors are always activated for a certain service (sub-)class. This means that they only collect data for service (sub-)classes they have been activated for.

In this sample scenario, you activate the DB2ACTVITIES event monitor for the service subclass SYSDEFAULTSUBCLASS. This way, you gather the monitor data for all user transactions because application-specific service classes have not been defined. Because there are no application-specific service classes, all user transactions are executed within the default service class and its subclass. Listing 16 shows the command to activate the activity event monitor for the default service (sub-)class.

Listing 16. Activate the activity event monitor for the default service (sub-)class
db2 "ALTER SERVICE CLASS SYSDEFAULTSUBCLASS UNDER SYSDEFAULTUSERCLASS
     COLLECT ACTIVITY DATA ON ALL DATABASE PARTITIONS WITH DETAILS"

Your collection of monitor data should cover a time frame in which you would expect normal database activity (for example, between 9 a.m. and 5 p.m. during a workday). After you finish collecting monitor data, deactivate the activity event monitor, as shown in Listing 17.

Listing 17. Deactivate the activity event monitor for the default service (sub-)class
db2 "ALTER SERVICE CLASS SYSDEFAULTSUBCLASS UNDER SYSDEFAULTUSERCLASS
     COLLECT ACTIVITY DATA NONE"

For evaluation of index usage based on collected WLM event monitor data, DB2 provides two Perl scripts named wlmhist.pl and wlmhistrep.pl. You can find these scripts in the samples\perl subdirectory of your DB2 installation. Both scripts expect the database name and the database username and password as input parameters.

Run the wlmhist.pl script first. It reads the SQL statements executed from the event monitor tables and runs the Explain utility for each statement to generate the corresponding access plan. After reading the SQL statements, the script extracts access plan information from the Explain tables and writes it to a table named WLMHIST (if the WLMHIST table does not already exist the script creates it).

Next, run the wlmhistrep.pl script to evaluate the data stored in the WLMHIST table. The script creates a report that provides details about table and index usage. When running wlmhistrep.pl, you also have to specify the name of the resulting report file in addition to the database name and the database username and password.

Listing 18 shows the commands for running the Perl scripts that help you evaluate the index usage based on collected WLM event monitor data.

Listing 18. Execute the wlmhist.pl and wlmhistrep.pl Perl scripts
cd "C:\Program Files\IBM\SQLLIB\samples\perl"
perl wlmhist.pl sample userid password
perl wlmhistrep.pl sample userid password wlmhistrep.txt

The report generated by wlmhistrep.pl consists of four sections: used (hit) tables, unused (not hit) tables, used (hit) indexes, and unused (not hit) indexes. For this scenario, you are mainly interested in the unused indexes section. Listing 19 shows a sample report.

Listing 19. Sample report generated by wlmhistrep.pl
               INDEXES HIT REPORT FOR DATABASE sample
              _______________________________________________________


TABLE NAME         TABLE SCHEMA    OBJECT NAME        OBJECT SCHEMA   TOTAL HITS
__________________ _______________ __________________ _______________ __________

...
DEPARTMENT         FECHNER         PK_DEPARTMENT      FECHNER                245
DEPARTMENT         FECHNER         XDEPT2             FECHNER                678
EMPLOYEE           FECHNER         PK_EMPLOYEE        FECHNER                123
EMPLOYEE           FECHNER         XEMP2              FECHNER                456
PROJECT            FECHNER         PK_PROJECT         FECHNER                345
...

               INDEXES NOT HIT REPORT FOR DATABASE sample
              ___________________________________________________________


TABLE NAME         TABLE SCHEMA    INDEX NAME         INDEX SCHEMA    INDEX TYPE
__________________ _______________ __________________ _______________ __________

...
DEPARTMENT         FECHNER         XDEPT3             FECHNER         REG
PROJECT            FECHNER         XPROJ2             FECHNER         REG
...

The INDEXES HIT REPORT section has a TOTAL HITS column that indicates how many times an index was accessed. Indexes that were never accessed are shown in the INDEXES NOT HIT REPORT section.

MON_GET_INDEX table function

The db2pd utility section of this article showed you how to use the db2pd utility to retrieve index metrics. However, if you are using DB2 Version 9.7, you can retrieve the same information by querying the new MON_GET_INDEX table function. Using this table function instead of db2pd simplifies evaluation of the results because the index IDs (IIDs) can be converted to index names by joining with the SYSCAT.INDEXES catalog view. Listing 20 shows a sample of how to call the MON_GET_INDEX table function.

Listing 20. Sample call to the MON_GET_INDEX table function (file name mon_get_index.sql)
SELECT
     SUBSTR(SI.INDSCHEMA, 1, 30) AS INDSCHEMA,
     SUBSTR(SI.INDNAME, 1, 30) AS INDNAME,
     MGI.INDEX_SCANS,
     MGI.INDEX_ONLY_SCANS
FROM
    TABLE(MON_GET_INDEX('FECHNER', 'DEPARTMENT', -2)) as MGI,
    SYSCAT.INDEXES AS SI
WHERE
    MGI.TABSCHEMA = SI.TABSCHEMA
    AND MGI.TABNAME = SI.TABNAME
    AND MGI.IID = SI.IID
ORDER BY
    MGI.INDEX_SCANS DESC;

MON_GET_INDEX expects the following parameters:

  • Table schema
  • Table name
  • Database partition number

For a single partition database, you can just specify -1 (current partition) or -2 (all partitions) for the database partition number and the results will be the same. If you omit a table name (that is, you specify NULL or an empty string), information is returned for all tables in the schema specified. If you omit table name and table schema, information is returned for all tables in the database.

Listing 21 shows a sample result set from a query such as the one above.

Listing 21. Results from the MON_GET_INDEX table function
db2 -tf mon_get_index.sql

INDSCHEMA            INDNAME              INDEX_SCANS          INDEX_ONLY_SCANS
-------------------- -------------------- -------------------- --------------------
FECHNER              XDEPT2                                678                    0
FECHNER              PK_DEPARTMENT                         245                    0
FECHNER              XDEPT3                                  0                    0

  3 record(s) selected.

The MON_GET_INDEX table function and db2pd utility provide the same index metrics (number of index scans and index only scans). However, you may find it easier to work with the result set returned by the MON_GET_INDEX table function than it is to evaluate the output of the db2pd utility.

Summary

Unused and seldom used indexes cause UID operations (UPDATE, INSERT, and DELETE) and database maintenance activities (such as RUNSTATS and REORG) to run unnecessarily long. Such indexes also require storage space, within tablespace containers as well as within backup images, that you might better use for other purposes. This article demonstrates different approaches to identify unused and seldom used indexes defined within a database. Based on that information, you can delete indexes that are not required. By doing so, you may be able to significantly improve the overall performance of your DB2 LUW database.


Downloadable resources


Related topics


Comments

Sign in or register to add and subscribe to comments.

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=433370
ArticleTitle=How to identify unused indexes in DB2 for Linux, UNIX, and Windows
publish-date=10082009