Improve index analysis with DB2 10.1

Taking advantage of usage lists

One of the most challenging parts of index design is verifying that IBM® DB2® is using a particular index in a way that was intended. Explain output for an SQL operation can show whether an index is being utilized, but it can be extremely difficult and time consuming to generate Explain data for every SQL statement found in an application. Not to mention the fact that looking for index usage in the volumes of Explain data that such an endeavor might produce would be extremely tedious. Furthermore, many applications use dynamic SQL that is generated on-the-fly, which means that one or more SQL statements coded in an application can have hundreds or even thousands of variations. And each variation can result in a different data access path, with some utilizing available indexes and others relying solely on table scans. This article is designed to introduce you to an object and a set of new table functions that can be used to analyze index usage in DB2 10.1 for Linux®, UNIX®, and Windows® databases.

Paul Turpin (paul.turpin@aciworldwide.com), Senior Software Engineer, ACI Worldwide

Author photo of Paul TurpinPaul Turpin is a Senior Software Engineer at ACI Worldwide. He has been working with DB2 for over 17 years and has been a DB2 DBA for more than 15 years. Paul holds the IBM Certified Database Administrator - DB2 10.1 for Linux, UNIX, and Windows certification, as well as the IBM Certified Database Administrator - DB2 10 for z/OS certification. He serves on the Board of Directors for the International DB2 Users Group (IDUG) and has been an IBM Champion since 2008.



01 November 2012

Also available in Russian

Why use an index?

Indexes serve two major purposes. They provide a logical ordering of the rows in a table, and they speed up data access by providing a fast, efficient method for locating specific rows of data. When properly defined, indexes can also guarantee uniqueness of records stored in a table. But, this functionality does not come without a cost, as additional overhead is needed to maintain indexes when INSERT, UPDATE, and DELETE operations are performed. And in some environments, such as On-Line Transaction Processing (OLTP) environments, this extra overhead can reduce performance to unacceptable levels. Thus, there is a delicate balance between using indexes to speed up data access and the impact to performance that is introduced by having indexes. And it is critical, particularly in OLTP environments, to ensure that any indexes that exist are being used effectively.


Monitoring index usage with db2pd

The DB2 Problem Determination tool (db2pd) is a handy diagnostic tool that can be used to obtain specific information about a DB2 environment. Unlike other monitoring tools, db2pd attaches to DB2 shared memory sets to obtain information; because it doesn't go through the DB2 engine, db2pd does not have to compete for database resources, making it very lightweight and efficient. And since db2pd works directly with memory, it can retrieve data quickly in a very non-intrusive manner.

More than 50 commands and options can be used to control the type of information that db2pd collects, and one option in particular, -tcbstats, can be used to tell db2pd to capture only information about index usage. For example, to use db2pd to obtain information about how indexes defined for a database named SAMPLE are being utilized, you would execute a command as shown in Listing 1.

Listing 1. Use db2p command
db2pd -db sample -tcbstats index

Analyzing the output produced by this command can be difficult, because indexes are referenced by their index identifier (IID) rather than by name. The format of the output produced is also challenging to work with; there is a technique for parsing -tcbstats output to extract index usage data, however, the process is a bit convoluted and can easily be broken if the db2pd output format changes.

The technique consists of parsing the -tcbstats output to extract a subset of index usage information, and loading the information retrieved into a table that has been assigned a name like INDEX_USAGE. Explain information is then generated for a number of SQL queries, and the INDEX_USAGE table is later joined with the SYSCAT.TABLES and SYSCAT.INDEXES system catalog views and one or more Explain tables to determine which SQL statements are utilizing which indexes. Unfortunately, it's easy to miss important SQL statements with this technique.

Another issue is that, because data retrieved by db2pd is stored in various DB2 memory sets, the data goes away when the database being examined is deactivated. Consequently, index usage information that has been collected over days or weeks can disappear in seconds. The information about index usage that is returned by db2pd has changed over time, and with DB2 10.1, index usage information is only collected for DML operations.


Monitoring index usage with administrative table functions

In DB2 9.7, improvements were made to the ways in which information about index usage information can be obtained. For example, a LASTUSED column was added to the SYSCAT.INDEXES system catalog view. And because this information is stored in a table, as opposed to DB2 memory sets, the information is retained long after a database is deactivated. More importantly, this information is maintained by a background daemon process named db2lused, and is updated every fifteen minutes. The MON_GET_INDEX() table function was also added, greatly simplifying the extraction of index usage information.

This function returns information about indexes that have been used since the last database activation. In fact, the same information that is returned by the db2pd -tcbstats index command can be obtained using the MON_GET_INDEX() table function. A query that uses this function to obtain information about how indexes being used looks similar to what is shown in Listing 2.

Listing 2. Query to obtain information using indexes
SELECT SUBSTR(S.INDSCHEMA, 1, 10) AS indschema,
    SUBSTR(S.TABNAME, 1, 30) AS table_name,
    SUBSTR(S.INDNAME, 1, 30) AS index_name,
    T.INDEX_SCANS,
    T.INDEX_ONLY_SCANS
FROM TABLE(MON_GET_INDEX('','', -2)) AS t, SYSCAT.INDEXES AS s
WHERE T.TABSCHEMA = S.TABSCHEMA
    AND T.TABNAME = S.TABNAME
    AND T.IID = S.IID
    AND S.INDSCHEMA NOT LIKE 'SYS%'
ORDER BY S.TABNAME, S.INDNAME DESC

This particular query will show the number of index scans where the results of a query were satisfied by using an index to locate data in a table, and index-only scans (where the results of a query were satisfied by accessing the index only) for each index in a database, sorted by table name, and then by index name. This information is quite valuable, but with this information alone, it can be difficult to determine what SQL statement(s) actually caused an index to be used.


Monitoring index usage with usage lists

In DB2 10.1, a new database object known as a usage list was introduced to help with table and index usage monitoring. As the name implies, a usage list is a database object that is used to record information about DML statements that reference a particular table or index. Each entry in a usage list contains information about the number of times a particular DML operation was performed, as well as aggregate statistics that indicate how each statement affected the table or index during execution. Usage lists are created by executing the CREATE USAGE LIST command; the basic syntax for this command is shown in Listing 3.

Listing 3. CREATE USAGE LIST basic syntax
CREATE USAGE LIST [ULName]
FOR [TABLE | INDEX] [ObjectName]
<LIST SIZE [100 | NumEntries]>
<WHEN FULL [WRAP | DEACTIVATE]>
<[INACTIVE | ACTIVE] ON START DATABASE>

Where:

  • ULName identifies the name that is to be assigned to the usage list being created.
  • ObjectName identifies by name, the existing table or index the usage list is to be created for.
  • NumEntries identifies the maximum number of entries that can be stored in the usage list being created.

Note: Parameters or options shown in normal brackets ([ ]) are required; parameters/options shown in angle brackets (< >) are not.

Once a usage list has been created and activated, two monitor table functions that were also introduced in DB2 10.1 - MON_GET_TABLE_USAGE_LIST() and MON_GET_INDEX_USAGE_LIST() - can be used to obtain and format any usage information collected.

An example of how to use a usage list to monitor index usage

To demonstrate how usage lists can be used to monitor index usage, look at some example scenarios that were developed using the PRODUCT table that is part of the SAMPLE database that ships with DB2 for Linux, UNIX, and Windows. This database can be constructed by executing the command db2sampl.

If you examine the physical design of the SAMPLE database carefully, you will discover that an index named PK_PRODUCT is the only REGULAR index that has been defined on the PRODUCT table. Any of the DB2 commands/SQL statements shown in Listing 4 can be used to determine what columns were used to define the PK_PRODUCT index.

Listing 4. Sample DB2 commands/SQL statements
DESCRIBE INDEXES FOR TABLE product SHOW DETAIL
                
SELECT SUBSTR(INDNAME, 1, 20) AS index_name, COLNAMES 
    FROM SYSCAT.INDEXES
    WHERE TABNAME = 'PRODUCT'
                
db2look -d SAMPLE -e -tw PRODUCT -o PRODUCT.DDL

To create an index usage list named PK_PRODUCT_USE for the PK_PRODUCT index, simply execute the SQL statement shown in Listing 5.

Listing 5. Sample SQL statement for index usage
CREATE USAGE LIST pk_product_use FOR INDEX PK_PRODUCT
    LIST SIZE 500
    WHEN FULL WRAP

This particular statement sets the maximum number of entries allowed in the usage list being created to 500, and tells DB2 that when the usage list is full, it is to wrap and overwrite earlier entries that were stored in the list.

Before a usage list can be used to record information about DML statements that reference a table or index, it must first be activated. If the ACTIVE ON START DATABASE clause was specified with the CREATE USAGE LIST statement that was used to create it, a usage list will be activated automatically when the database the list is stored in is activated or the first time a connection to the database is established. Otherwise, a usage list can be activated by executing the SET USAGE LIST statement. The statement shown in Listing 6 can be used to activate the PK_PRODUCT_USE usage list that was created previously.

Listing 6. Activate PK_PRODUCT_USE list
SET USAGE LIST pk_product_use STATE ACTIVE

Beginning with DB2 10.1, the ALTER SQL statement can be used to change the definition of a usage list that was created previously, for example, the statement shown in Listing 7 will change the number of entries allowed in the specified usage list to 1,000.

Listing 7. Use ALTER SQL statement
ALTER USAGE LIST pk_product_use LIST SIZE 1000

On the other hand, the statement shown in Listing 8 will change the behavior of the usage list such that when the list becomes full, DB2 will deactivate it, thereby preventing data that has already been collected from being overwritten.

Listing 8. Statement changing behavior of usage list
ALTER USAGE LIST pk_product_use WHEN FULL DEACTIVATE

To obtain a list of all usage lists that have been defined for a particular database, including those that have been assigned a state of Released, which indicates that memory associated with the usage list has been released, query the SYSCAT.USAGELISTS system catalog table with a SELECT statement that looks similar to Listing 9.

Listing 9. Using the SELECT statement
SELECT * from SYSCAT.USAGELISTS

To determine the status of one or more usage lists that are not in the Released state, construct a query that calls the MON_GET_USAGE_LIST_STATUS() monitor function instead, as shown in Listing 10.

Listing 10. Monitor function
SELECT * FROM TABLE(MON_GET_USAGE_LIST_STATUS(NULL, NULL, -2))

Once a usage list for an index has been activated, it will begin collecting information about DML statements that access the index for which it was defined. And, as was mentioned previously, information stored in a usage list for an index can be obtained by constructing a query that utilizes the MON_GET_INDEX_USAGE_LIST() table function. For example, the query shown in Listing 11 could be used to obtain information stored in the PK_PRODUCT_USE usage list you created previously.

Listing 11. Obtain information in PK_PRODUCT_USE usage list
SELECT NUM_REFERENCES, EXECUTABLE_ID
FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT_USE', -2))

The second parameter passed to the MON_GET_INDEX_USAGE_LIST() table function is the name of a specific usage list; if you specify the NULL value for this parameter, the function will return information for all usage lists that have been defined and are active. Thus, if you have multiple usage lists defined, and one or more are active, and you want to obtain information from all of them simultaneously, then you can do so by executing a query that looks similar to Listing 12.

Listing 12. Obtain information simultaneously
SELECT 
      SUBSTR(USAGELISTNAME,1,20) AS usagelistname,
      SUBSTR(INDNAME,1,20) AS indname, NUM_REFERENCES, EXECUTABLE_ID
    FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2))

There is additional information that can be returned by the MON_GET_INDEX_USAGE_LIST() table function, such as the number of logical index reads and the number of physical index reads, but when trying to determine whether or not a particular index is being utilized, this information is not needed.

Notice that in both examples shown, EXECUTABLE_ID values were retrieved. Every entry in the package cache is assigned a unique EXECUTABLE_ID. Therefore, by examining the EXECUTABLE_ID it is possible to determine which SQL statements are associated with the use of a particular index. Joining the results of the MON_GET_INDEX_USAGE_LIST() and MON_GET_PKG_CACHE_STMT() table functions on the EXECUTABLE_ID column is a powerful way to determine which SQL statements are causing a particular index to be utilized by the DB2 Optimizer.

Going back to the PRODUCT table in the SAMPLE database, if the SQL statements shown in Listing 13 are executed, the DB2 Optimizer should elect to use the PK_PRODUCT index to deliver optimum query performance.

Listing 13. Use PK_PRODUCT index
SELECT pid FROM PRODUCT WHERE pid = '100-201-01';
SELECT pid FROM PRODUCT WHERE pid = '100-201-01';
SELECT pid FROM PRODUCT WHERE pid = '100-103-01';

Assuming a usage list has been created and activated for the PK_PRODUCT index, the following query, which joins the MON_GET_PKG_CACHE_STMT() and MON_GET_INDEX_USAGE_LIST() table functions on the EXECUTABLE_ID column, should produce a report that verifies that the SQL statements just executed are indeed using the PK_PRODUCT index, as shown in Listing 14.

Listing 14. Query joining table functions
SELECT
      SUBSTR(i.INDNAME,1,14) AS index_name,
      SUBSTR(t.STMT_TEXT,1,50) AS sql_statment, t.NUM_EXECUTIONS
    FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t,
         TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT', -2)) AS i
    WHERE t.EXECUTABLE_ID  = i.EXECUTABLE_ID
    ORDER BY NUM_EXECUTIONS DESC;

And if that is the case, the output produced should look similar to Listing 15.

Listing 15. Example output
INDEX_NAME     SQL_STATEMENT                                      NUM_EXECUTIONS
-------------- -------------------------------------------------- --------------
PK_PRODUCT     SELECT pid FROM PRODUCT WHERE pid = '100-201-01'                2
PK_PRODUCT     SELECT pid FROM PRODUCT WHERE pid = '100-103-01'                1

In this example, the results are sorted so that the most frequently used statements are presented first. To limit the results to just the top 20 statements, add the FETCH FIRST 20 ROWS ONLY clause to the query used.

There are two things to keep in mind when using the Package Cache in this manner. First, you need to make sure that the value of the pckcachesz database configuration parameter has been set high enough to store all the SQL statements that you plan to test. In this example, the pckcachesz configuration parameter was set to 10000, because only a few simple SQL statements were being evaluated. In your environment, it may be necessary to set this parameter much higher.

Second, during testing, make sure that a dynamic flush package cache operation is NOT performed. Otherwise, you run the risk of losing the SQL statement text data you were hoping to capture.

Previously, it was mentioned that a usage list is associated with a single table or index. However, often it is important to look at index usage for all indexes that have been defined for a given table. There is a small amount of overhead associated with managing multiple usage lists, but having multiple usage lists available is much more efficient, in terms of person-hours required, if you want to examine at all the index usage at one time. Therefore, it is always better to create a usage list for each index for a given table.

Suppose three additional indexes were created for the PRODUCT table in the SAMPLE database, using the SQL statements shown in Listing 16.

Listing 16. Add three indexes
CREATE INDEX product_gi1 ON PRODUCT (name ASC);
                
CREATE INDEX product_gi2 ON PRODUCT (promostart ASC, promoend ASC);
                
CREATE INDEX product_gi3 ON PRODUCT (price ASC);

The system catalog for the SAMPLE database can help us generate the CREATE USAGE LIST and ACTIVATE USAGE LIST statements that are needed to create and activate usage lists for each of these new indexes. For instance, the queries shown in Listing 17 could be used to create the necessary DDL.

Listing 17. Queries to create DDL
SELECT
    'CREATE USAGE LIST ' 
    CONCAT INDNAME 
    CONCAT ' FOR INDEX '
    CONCAT RTRIM(INDSCHEMA) CONCAT '.' 
    CONCAT INDNAME 
    CONCAT ' LIST SIZE 500 WHEN FULL WRAP ;'
 FROM SYSCAT.INDEXES i
 WHERE TABNAME = 'PRODUCT'
    AND INDSCHEMA NOT LIKE 'SYS%' i
    AND INDEXTYPE = 'REG'
 ORDER BY INDNAME;
                
SELECT
    'SET USAGE LIST ' 
    CONCAT INDNAME 
    CONCAT ' STATE ACTIVE;'
  FROM SYSCAT.INDEXES i
  WHERE TABNAME = 'PRODUCT'
    AND INDSCHEMA NOT LIKE 'SYS%'
    AND INDEXTYPE = 'REG'
   ORDER BY INDNAME;

When these two queries are executed, they generate DDL that can be used to both create and activate four usage lists - one for each index that has been defined for the PRODUCT table. Similar queries can be used to drop the usage lists as well.

Once the appropriate usage lists have been created and activated, a test designed to execute all of the SQL statements against the table that you want to gather index usage information for should be developed and run. For example, the SQL statements (in addition to the previous three) shown in Listing 18 could be used to evaluate how the new indexes that were defined for the PRODUCT table earlier are being utilized.

Listing 18. An example of using SQL statements to evaluate the new indexes
SELECT pid FROM product 
    WHERE name = 'SNOW SHOVEL, BASIC 22 INCH';
                
SELECT pid FROM product 
    WHERE name = 'SNOW SHOVEL, SUPER DELUXE 26 INCH';
                
SELECT pid, name FROM product WHERE price = 19.99;
SELECT pid, name FROM product WHERE price = 49.99;
                
SELECT pid, name, price FROM product WHERE promostart = '11/19/2004';
                
SELECT pid, name, price FROM product 
    WHERE promostart = '12/22/2005' AND promoend = '02/22/2006';

Once these SQL statements are executed, the query, as shown in Listing 19, can be used to determine which, if any, of the new indexes were chosen by the DB2 Optimizer.

Listing 19. Query to determine which new indexes were chosen by the DB2 Optimizer
SELECT
        SUBSTR(i.INDNAME,1,12) AS index_name,
        SUBSTR(t.STMT_TEXT,1,59) AS sql_statment, t.NUM_EXECUTIONS
    FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t,
        TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2)) AS i
    WHERE t.EXECUTABLE_ID = i.EXECUTABLE_ID
    ORDER BY NUM_EXECUTIONS DESC

And when this query is executed, the output produced should look similar to Listing 20.

Listing 20. Output of the executed query
INDEX_NAME   SQL_STATEMENT                                               NUM_EXECUTIONS
------------ ----------------------------------------------------------- --------------
PK_PRODUCT   SELECT pid FROM PRODUCT WHERE pid = '100-201-01'                         2
PRODUCT_GI2  SELECT pid, name, price FROM product WHERE promostart = '11              1
PK_PRODUCT   SELECT pid FROM PRODUCT WHERE pid = '100-103-01'                         1
PRODUCT_GI1  SELECT pid FROM product WHERE name = 'SNOW SHOVEL, BASIC 22              1
PRODUCT_GI3  SELECT pid, name FROM product WHERE price = 49.99                        1
PRODUCT_GI1  SELECT pid FROM product WHERE name = 'SNOW SHOVEL, SUPER DE              1
PRODUCT_GI2  SELECT pid, name, price FROM product WHERE promostart = '12              1
PRODUCT_GI3  SELECT pid, name FROM product WHERE price = 19.99                        1

Because the second parameter of the MON_GET_INDEX_USAGE_LIST() table function is set to NULL, the function will return information for all active index usage lists. Assuming all SQL statements that query a given table have been executed, the results produced by this query should show how all the indexes available are being utilized. In this case, all of the queries used were designed such that the DB2 Optimizer would utilize every index available. However, in a real-world situation, testing would most likely reveal unused indexes or indexes that are being used inefficiently.


Conclusion

As DB2 has evolved over the years, it has become easier to get index usage information. And with DB2 10.1, tracking down the exact SQL statements that are using a particular index has gotten much easier. By using index usage lists and the new MON_GET_INDEX_USAGE_LIST() table function combined with the older MON_GET_PKG_CACHE_STMT() table function, it is now possible to quickly evaluate whether the DB2 Optimizer is taking advantage of indexes that have been defined.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=842891
ArticleTitle=Improve index analysis with DB2 10.1
publish-date=11012012