Improve index analysis with DB2 10.1
Taking advantage of usage lists
Why use an index?
Indexes serve two major purposes. They provide a logical ordering of the
rows in a table, and they speed up data access by providing a fast,
efficient method for locating specific rows of data. When properly
defined, indexes can also guarantee uniqueness of records stored in a
table. But, this functionality does not come without a cost, as additional
overhead is needed to maintain indexes when
DELETE operations are performed. And in some
environments, such as On-Line Transaction Processing (OLTP) environments,
this extra overhead can reduce performance to unacceptable levels. Thus,
there is a delicate balance between using indexes to speed up data access
and the impact to performance that is introduced by having indexes. And it
is critical, particularly in OLTP environments, to ensure that any indexes
that exist are being used effectively.
Monitoring index usage with db2pd
The DB2 Problem Determination tool
db2pd) is a handy
diagnostic tool that can be used to obtain specific information about a
DB2 environment. Unlike other monitoring tools,
db2pd attaches to DB2 shared memory sets to
obtain information; because it doesn't go through the DB2 engine,
db2pd does not have to compete for database
resources, making it very lightweight and efficient. And since
db2pd works directly with memory, it can
retrieve data quickly in a very non-intrusive manner.
More than 50 commands and options can be used to control the type of
db2pd collects, and one option
-tcbstats, can be used to tell
db2pd to capture only information about index
usage. For example, to use
db2pd to obtain
information about how indexes defined for a database named SAMPLE are
being utilized, you would execute a command as shown in Listing 1.
Listing 1. Use db2p command
db2pd -db sample -tcbstats index
Analyzing the output produced by this command can be difficult, because
indexes are referenced by their index identifier (IID) rather than by
name. The format of the output produced is also challenging to work with;
there is a technique for parsing
output to extract index usage data, however, the process is a bit
convoluted and can easily be broken if the
db2pd output format changes.
The technique consists of parsing the
output to extract a subset of index usage information, and loading the
information retrieved into a table that has been assigned a name like
INDEX_USAGE. Explain information is then generated for a number of SQL
queries, and the INDEX_USAGE table is later joined with the SYSCAT.TABLES
and SYSCAT.INDEXES system catalog views and one or more Explain tables to
determine which SQL statements are utilizing which indexes. Unfortunately,
it's easy to miss important SQL statements with this technique.
Another issue is that, because data retrieved by
db2pd is stored in various DB2 memory sets, the
data goes away when the database being examined is deactivated.
Consequently, index usage information that has been collected over days or
weeks can disappear in seconds. The information about index usage that is
db2pd has changed over time, and
with DB2 10.1, index usage information is only collected for DML
Monitoring index usage with administrative table functions
In DB2 9.7, improvements were made to the ways in which information about
index usage information can be obtained. For example, a LASTUSED column
was added to the SYSCAT.INDEXES system catalog view. And because this
information is stored in a table, as opposed to DB2 memory sets, the
information is retained long after a database is deactivated. More
importantly, this information is maintained by a background daemon process
db2lused, and is updated every fifteen
MON_GET_INDEX() table function was
also added, greatly simplifying the extraction of index usage information.
This function returns information about indexes that have been used since
the last database activation. In fact, the same information that is
returned by the
db2pd -tcbstats index command
can be obtained using the
function. A query that uses this function to obtain information about how
indexes being used looks similar to what is shown in Listing 2.
Listing 2. Query to obtain information using indexes
SELECT SUBSTR(S.INDSCHEMA, 1, 10) AS indschema, SUBSTR(S.TABNAME, 1, 30) AS table_name, SUBSTR(S.INDNAME, 1, 30) AS index_name, T.INDEX_SCANS, T.INDEX_ONLY_SCANS FROM TABLE(MON_GET_INDEX('','', -2)) AS t, SYSCAT.INDEXES AS s WHERE T.TABSCHEMA = S.TABSCHEMA AND T.TABNAME = S.TABNAME AND T.IID = S.IID AND S.INDSCHEMA NOT LIKE 'SYS%' ORDER BY S.TABNAME, S.INDNAME DESC
This particular query will show the number of index scans where the results of a query were satisfied by using an index to locate data in a table, and index-only scans (where the results of a query were satisfied by accessing the index only) for each index in a database, sorted by table name, and then by index name. This information is quite valuable, but with this information alone, it can be difficult to determine what SQL statement(s) actually caused an index to be used.
Monitoring index usage with usage lists
In DB2 10.1, a new database object known as a usage list was
introduced to help with table and index usage monitoring. As the name
implies, a usage list is a database object that is used to record
information about DML statements that reference a particular table or
index. Each entry in a usage list contains information about the number of
times a particular DML operation was performed, as well as aggregate
statistics that indicate how each statement affected the table or index
during execution. Usage lists are created by executing the
CREATE USAGE LIST command; the basic syntax for
this command is shown in Listing 3.
Listing 3. CREATE USAGE LIST basic syntax
CREATE USAGE LIST [ULName] FOR [TABLE | INDEX] [ObjectName] <LIST SIZE [100 | NumEntries]> <WHEN FULL [WRAP | DEACTIVATE]> <[INACTIVE | ACTIVE] ON START DATABASE>
- ULName identifies the name that is to be assigned to the usage list being created.
- ObjectName identifies by name, the existing table or index the usage list is to be created for.
- NumEntries identifies the maximum number of entries that can be stored in the usage list being created.
Note: Parameters or options shown in normal brackets ([ ]) are required; parameters/options shown in angle brackets (< >) are not.
Once a usage list has been created and activated, two monitor table
functions that were also introduced in DB2 10.1 -
MON_GET_INDEX_USAGE_LIST() - can be used to
obtain and format any usage information collected.
An example of how to use a usage list to monitor index usage
To demonstrate how usage lists can be used to monitor index usage, look at
some example scenarios that were developed using the PRODUCT table that is
part of the SAMPLE database that ships with DB2 for Linux, UNIX, and
Windows. This database can be constructed by executing the command
If you examine the physical design of the SAMPLE database carefully, you will discover that an index named PK_PRODUCT is the only REGULAR index that has been defined on the PRODUCT table. Any of the DB2 commands/SQL statements shown in Listing 4 can be used to determine what columns were used to define the PK_PRODUCT index.
Listing 4. Sample DB2 commands/SQL statements
DESCRIBE INDEXES FOR TABLE product SHOW DETAIL SELECT SUBSTR(INDNAME, 1, 20) AS index_name, COLNAMES FROM SYSCAT.INDEXES WHERE TABNAME = 'PRODUCT' db2look -d SAMPLE -e -tw PRODUCT -o PRODUCT.DDL
To create an index usage list named PK_PRODUCT_USE for the PK_PRODUCT index, simply execute the SQL statement shown in Listing 5.
Listing 5. Sample SQL statement for index usage
CREATE USAGE LIST pk_product_use FOR INDEX PK_PRODUCT LIST SIZE 500 WHEN FULL WRAP
This particular statement sets the maximum number of entries allowed in the usage list being created to 500, and tells DB2 that when the usage list is full, it is to wrap and overwrite earlier entries that were stored in the list.
Before a usage list can be used to record information about DML statements
that reference a table or index, it must first be activated. If the
ACTIVE ON START DATABASE clause was specified
CREATE USAGE LIST statement that was
used to create it, a usage list will be activated automatically when the
database the list is stored in is activated or the first time a connection
to the database is established. Otherwise, a usage list can be activated
by executing the
USAGE LIST statement.
The statement shown in Listing 6 can be used to activate the
PK_PRODUCT_USE usage list that was created previously.
Listing 6. Activate PK_PRODUCT_USE list
SET USAGE LIST pk_product_use STATE ACTIVE
Beginning with DB2 10.1, the
ALTER SQL statement
can be used to change the definition of a usage list that was created
previously, for example, the statement shown in Listing 7 will change the
number of entries allowed in the specified usage list to 1,000.
Listing 7. Use ALTER SQL statement
ALTER USAGE LIST pk_product_use LIST SIZE 1000
On the other hand, the statement shown in Listing 8 will change the behavior of the usage list such that when the list becomes full, DB2 will deactivate it, thereby preventing data that has already been collected from being overwritten.
Listing 8. Statement changing behavior of usage list
ALTER USAGE LIST pk_product_use WHEN FULL DEACTIVATE
To obtain a list of all usage lists that have been defined for a particular
database, including those that have been assigned a state of
Released, which indicates that memory associated with the
usage list has been released, query the SYSCAT.USAGELISTS system catalog
table with a
SELECT statement that looks
similar to Listing 9.
Listing 9. Using the SELECT statement
SELECT * from SYSCAT.USAGELISTS
To determine the status of one or more usage lists that are not in the
Released state, construct a query that calls the
MON_GET_USAGE_LIST_STATUS() monitor function
instead, as shown in Listing 10.
Listing 10. Monitor function
SELECT * FROM TABLE(MON_GET_USAGE_LIST_STATUS(NULL, NULL, -2))
Once a usage list for an index has been activated, it will begin collecting
information about DML statements that access the index for which it was
defined. And, as was mentioned previously, information stored in a usage
list for an index can be obtained by constructing a query that utilizes
MON_GET_INDEX_USAGE_LIST() table function.
For example, the query shown in Listing 11 could be used to obtain
information stored in the PK_PRODUCT_USE usage list you created
Listing 11. Obtain information in PK_PRODUCT_USE usage list
SELECT NUM_REFERENCES, EXECUTABLE_ID FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT_USE', -2))
The second parameter passed to the
MON_GET_INDEX_USAGE_LIST() table function is
the name of a specific usage list; if you specify the NULL value for this
parameter, the function will return information for all usage lists that
have been defined and are active. Thus, if you have multiple usage lists
defined, and one or more are active, and you want to obtain information
from all of them simultaneously, then you can do so by executing a query
that looks similar to Listing 12.
Listing 12. Obtain information simultaneously
SELECT SUBSTR(USAGELISTNAME,1,20) AS usagelistname, SUBSTR(INDNAME,1,20) AS indname, NUM_REFERENCES, EXECUTABLE_ID FROM TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2))
There is additional information that can be returned by the
MON_GET_INDEX_USAGE_LIST() table function, such
as the number of logical index reads and the number of physical index
reads, but when trying to determine whether or not a particular index is
being utilized, this information is not needed.
Notice that in both examples shown, EXECUTABLE_ID values were retrieved.
Every entry in the package cache is assigned a unique EXECUTABLE_ID.
Therefore, by examining the EXECUTABLE_ID it is possible to determine
which SQL statements are associated with the use of a particular index.
Joining the results of the
MON_GET_PKG_CACHE_STMT() table functions on the
EXECUTABLE_ID column is a powerful way to determine which SQL statements
are causing a particular index to be utilized by the DB2 Optimizer.
Going back to the PRODUCT table in the SAMPLE database, if the SQL statements shown in Listing 13 are executed, the DB2 Optimizer should elect to use the PK_PRODUCT index to deliver optimum query performance.
Listing 13. Use PK_PRODUCT index
SELECT pid FROM PRODUCT WHERE pid = '100-201-01'; SELECT pid FROM PRODUCT WHERE pid = '100-201-01'; SELECT pid FROM PRODUCT WHERE pid = '100-103-01';
Assuming a usage list has been created and activated for the PK_PRODUCT
index, the following query, which joins the
MON_GET_INDEX_USAGE_LIST() table functions on
the EXECUTABLE_ID column, should produce a report that verifies that the
SQL statements just executed are indeed using the PK_PRODUCT index, as
shown in Listing 14.
Listing 14. Query joining table functions
SELECT SUBSTR(i.INDNAME,1,14) AS index_name, SUBSTR(t.STMT_TEXT,1,50) AS sql_statment, t.NUM_EXECUTIONS FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t, TABLE(MON_GET_INDEX_USAGE_LIST(NULL, 'PK_PRODUCT', -2)) AS i WHERE t.EXECUTABLE_ID = i.EXECUTABLE_ID ORDER BY NUM_EXECUTIONS DESC;
And if that is the case, the output produced should look similar to Listing 15.
Listing 15. Example output
INDEX_NAME SQL_STATEMENT NUM_EXECUTIONS -------------- -------------------------------------------------- -------------- PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-201-01' 2 PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-103-01' 1
In this example, the results are sorted so that the most frequently used
statements are presented first. To limit the results to just the top 20
statements, add the
FETCH FIRST 20 ROWS ONLY
clause to the query used.
There are two things to keep in mind when using the Package Cache in this manner. First, you need to make sure that the value of the pckcachesz database configuration parameter has been set high enough to store all the SQL statements that you plan to test. In this example, the pckcachesz configuration parameter was set to 10000, because only a few simple SQL statements were being evaluated. In your environment, it may be necessary to set this parameter much higher.
Second, during testing, make sure that a dynamic flush package cache operation is NOT performed. Otherwise, you run the risk of losing the SQL statement text data you were hoping to capture.
Previously, it was mentioned that a usage list is associated with a single table or index. However, often it is important to look at index usage for all indexes that have been defined for a given table. There is a small amount of overhead associated with managing multiple usage lists, but having multiple usage lists available is much more efficient, in terms of person-hours required, if you want to examine at all the index usage at one time. Therefore, it is always better to create a usage list for each index for a given table.
Suppose three additional indexes were created for the PRODUCT table in the SAMPLE database, using the SQL statements shown in Listing 16.
Listing 16. Add three indexes
CREATE INDEX product_gi1 ON PRODUCT (name ASC); CREATE INDEX product_gi2 ON PRODUCT (promostart ASC, promoend ASC); CREATE INDEX product_gi3 ON PRODUCT (price ASC);
The system catalog for the SAMPLE database can help us generate the
CREATE USAGE LIST and
ACTIVATE USAGE LIST statements that are needed
to create and activate usage lists for each of these new indexes. For
instance, the queries shown in Listing 17 could be used to create the
Listing 17. Queries to create DDL
SELECT 'CREATE USAGE LIST ' CONCAT INDNAME CONCAT ' FOR INDEX ' CONCAT RTRIM(INDSCHEMA) CONCAT '.' CONCAT INDNAME CONCAT ' LIST SIZE 500 WHEN FULL WRAP ;' FROM SYSCAT.INDEXES i WHERE TABNAME = 'PRODUCT' AND INDSCHEMA NOT LIKE 'SYS%' i AND INDEXTYPE = 'REG' ORDER BY INDNAME; SELECT 'SET USAGE LIST ' CONCAT INDNAME CONCAT ' STATE ACTIVE;' FROM SYSCAT.INDEXES i WHERE TABNAME = 'PRODUCT' AND INDSCHEMA NOT LIKE 'SYS%' AND INDEXTYPE = 'REG' ORDER BY INDNAME;
When these two queries are executed, they generate DDL that can be used to both create and activate four usage lists - one for each index that has been defined for the PRODUCT table. Similar queries can be used to drop the usage lists as well.
Once the appropriate usage lists have been created and activated, a test designed to execute all of the SQL statements against the table that you want to gather index usage information for should be developed and run. For example, the SQL statements (in addition to the previous three) shown in Listing 18 could be used to evaluate how the new indexes that were defined for the PRODUCT table earlier are being utilized.
Listing 18. An example of using SQL statements to evaluate the new indexes
SELECT pid FROM product WHERE name = 'SNOW SHOVEL, BASIC 22 INCH'; SELECT pid FROM product WHERE name = 'SNOW SHOVEL, SUPER DELUXE 26 INCH'; SELECT pid, name FROM product WHERE price = 19.99; SELECT pid, name FROM product WHERE price = 49.99; SELECT pid, name, price FROM product WHERE promostart = '11/19/2004'; SELECT pid, name, price FROM product WHERE promostart = '12/22/2005' AND promoend = '02/22/2006';
Once these SQL statements are executed, the query, as shown in Listing 19, can be used to determine which, if any, of the new indexes were chosen by the DB2 Optimizer.
Listing 19. Query to determine which new indexes were chosen by the DB2 Optimizer
SELECT SUBSTR(i.INDNAME,1,12) AS index_name, SUBSTR(t.STMT_TEXT,1,59) AS sql_statment, t.NUM_EXECUTIONS FROM TABLE (MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2 )) AS t, TABLE(MON_GET_INDEX_USAGE_LIST(NULL, NULL, -2)) AS i WHERE t.EXECUTABLE_ID = i.EXECUTABLE_ID ORDER BY NUM_EXECUTIONS DESC
And when this query is executed, the output produced should look similar to Listing 20.
Listing 20. Output of the executed query
INDEX_NAME SQL_STATEMENT NUM_EXECUTIONS ------------ ----------------------------------------------------------- -------------- PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-201-01' 2 PRODUCT_GI2 SELECT pid, name, price FROM product WHERE promostart = '11 1 PK_PRODUCT SELECT pid FROM PRODUCT WHERE pid = '100-103-01' 1 PRODUCT_GI1 SELECT pid FROM product WHERE name = 'SNOW SHOVEL, BASIC 22 1 PRODUCT_GI3 SELECT pid, name FROM product WHERE price = 49.99 1 PRODUCT_GI1 SELECT pid FROM product WHERE name = 'SNOW SHOVEL, SUPER DE 1 PRODUCT_GI2 SELECT pid, name, price FROM product WHERE promostart = '12 1 PRODUCT_GI3 SELECT pid, name FROM product WHERE price = 19.99 1
Because the second parameter of the
MON_GET_INDEX_USAGE_LIST() table function is
set to NULL, the function will return information for all active index
usage lists. Assuming all SQL statements that query a given table have
been executed, the results produced by this query should show how all the
indexes available are being utilized. In this case, all of the queries
used were designed such that the DB2 Optimizer would utilize every index
available. However, in a real-world situation, testing would most likely
reveal unused indexes or indexes that are being used inefficiently.
As DB2 has evolved over the years, it has become easier to get index usage
information. And with DB2 10.1, tracking down the exact SQL statements
that are using a particular index has gotten much easier. By using index
usage lists and the new
MON_GET_INDEX_USAGE_LIST() table function
combined with the older
MON_GET_PKG_CACHE_STMT() table function, it is
now possible to quickly evaluate whether the DB2 Optimizer is taking
advantage of indexes that have been defined.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.