Troubleshooting IBM Tivoli Directory Server Performance, Part 2: Resolving slow queries using DB2 snapshots

IBM® Tivoli® Directory Server (TDS) is a key component of many corporate environments. Other IBM Tivoli products such as IBM Tivoli Access Manager for e-Business and IBM Tivoli Identity Manager use IBM Tivoli Directory Server as a key middleware component. This document discusses ways to identify and correct performance issues in the underlying DB2 database.

Casey A Peel (cpeel@us.ibm.com), IBM Tivoli Identity Manager Performance Lead, IBM

Casey PeelCasey A. Peel is the performance lead for IBM Tivoli Identity Manager. Based in Denver, Colorado, he has been working with the performance of various Tivoli Security products since joining IBM in 2000 -- including IBM Tivoli Access Manager (TAM), IBM Tivoli Identity Manager (TIM), IBM Tivoli Directory Server (TDS), and IBM Tivoli Directory Integrator (TDI). He has been directly involved with IBM Tivoli Identity Manager performance since version 4.4 and is the author of the ITIM Performance and Tuning Guide and related tuning scripts. He earned his B.S. in Computer Science from Texas A&M University.



15 September 2008

Performance tuning IBM Tivoli Directory Server

In this series we're exploring techniques and tools to use when troubleshooting poor performance for IBM Tivoli Directory Server. In this article we're focusing on the underlying DB2 database and ways to identify and correct bottlenecks therein.

The IBM Tivoli Directory Server DB2 database

IBM Tivoli Directory Server uses IBM DB2 to store and access the data stored in the directory. Roughly, TDS translates LDAP queries into DB2 queries to access the data in the underlying datastore. DB2 is the one that executes the query and returns the requested data to the TDS process (ibmslapd) who responds to the initial request with the results.

In many cases DB2 can be considered a black box. The database is created, tuned, and managed by TDS processes and tools. Most TDS deployments require little, if any, DB2 knowledge to administer. There are the rare cases, however, when an LDAP administrator is unable to improve the performance of slow queries using the conventional TDS interfaces and there is a need to delve down into DB2 to resolve the problem. This article addresses some of the common scenarios when this investigation of the DB2 layer is necessary.

The remainder of this article assumes you have some experience with databases, ideally DB2, and are comfortable collecting DB2-level information in your environment.

DB2 monitoring and snapshots

DB2 supports monitoring of several different areas:

  • statements
  • buffer pools
  • locks
  • sorts
  • tables
  • timestamp
  • units of work

To troubleshoot TDS performance problems at the DB2 level it is strongly suggested that you enable monitoring for all areas except tables. It has been our experience that having table monitoring enabled can have a non-negligible performance impact. Enabling monitoring in the other areas does not have a noticeable performance impact and these monitors can, and should, be left enabled in production environments.

Information from these monitors can be obtained through the collection of database snapshots.

Enabling, disabling, and resetting monitoring

DB2 monitoring can be enabled at the connection or database level. If monitors are enabled at the connection level, any metrics gathered are cleared and gathering stops when the connection is terminated. Enabling monitors at the database level has the benefit of providing metrics beginning immediately after the database is started without requiring an active connection.

To get the current monitor states of the database, connect to the database as the database administrator and run the following command:

$ db2 get monitor switches

            Monitor Recording Switches

Switch list for db partition number 0
Buffer Pool Activity Information  (BUFFERPOOL) = OFF
Lock Information                        (LOCK) = OFF
Sorting Information                     (SORT) = OFF
SQL Statement Information          (STATEMENT) = OFF
Table Activity Information             (TABLE) = OFF
Take Timestamp Information         (TIMESTAMP) = ON  07/28/2008 09:46:44.988974
Unit of Work Information                 (UOW) = OFF

The monitor states above are the default for new DB2 databases. Notice that the only monitor flag enabled by default is the TIMESTAMP monitor. For TDS monitoring, we need all of the monitors except the TABLE monitor enabled.

To enable connection-level monitoring, use the db2 update monitor switches command. Remember that with connection-level monitoring metrics are only gathered the connection is closed. Examples of how to enable connection-level monitoring are:

db2 update monitor switches using BUFFERPOOL on
db2 update monitor switches using LOCK on
db2 update monitor switches using SORT on
db2 update monitor switches using STATEMENT on
db2 update monitor switches using TIMESTAMP on
db2 update monitor switches using UOW on

Note that we left the TABLE monitor turned off for performance reasons.

For TDS it is recommended that you enable monitoring at the database level. This ensures that monitoring metrics are gathered when the database is started. This makes it easier to spot performance problems as they occur instead of needing to enable connection-level monitoring and recreate the problem. Database-level monitoring is controlled by the DFT_MON_* database manager parameters:

$ db2 get dbm cfg | grep DFT_MON
   Buffer pool                         (DFT_MON_BUFPOOL) = OFF
   Lock                                   (DFT_MON_LOCK) = OFF
   Sort                                   (DFT_MON_SORT) = OFF
   Statement                              (DFT_MON_STMT) = OFF
   Table                                 (DFT_MON_TABLE) = OFF
   Timestamp                         (DFT_MON_TIMESTAMP) = ON
   Unit of work                            (DFT_MON_UOW) = OFF

To enable database-level monitoring, update the desired DFT_MON_ switches:

db2 update dbm cfg using DFT_MON_BUFPOOL on
db2 update dbm cfg using DFT_MON_LOCK on
db2 update dbm cfg using DFT_MON_SORT on
db2 update dbm cfg using DFT_MON_STMT on
db2 update dbm cfg using DFT_MON_TIMESTAMP on
db2 update dbm cfg using DFT_MON_UOW on

Note that we left the DFT_MON_TABLE monitor turned off for performance reasons. There's no need to enable DFT_MON_TIMESTAMP as it is enabled by default, but it is included here for completeness.

For the database-level monitoring to take effect the database must be stopped and restarted using db2stop and db2start. Before executing db2stop, the ibmslapd process must be stopped along with any other connected applications. After executing db2start the ibmslapd process must be restarted. Neglecting to restart DB2 will result in monitoring not being enabled which can be frustrating if a problem arises later for which you were hoping to have some metrics. This happens often enough that it bears repeating: be sure to restart DB2 after enabling database-level monitoring.

If you want to disable database-level monitoring, run the following commands as the database administrator:

db2 update dbm cfg using DFT_MON_STMT off
db2 update dbm cfg using DFT_MON_BUFPOOL off
db2 update dbm cfg using DFT_MON_LOCK off
db2 update dbm cfg using DFT_MON_SORT off
db2 update dbm cfg using DFT_MON_TABLE off
db2 update dbm cfg using DFT_MON_UOW off

Note that we left DFT_MON_TIMESTAMP enabled as this is enabled by default. There should be no need to disable DFT_MON_TABLE as we didn't enable it, but it is included here for completeness.

After disabling monitoring, the database must be stopped and restarted, using db2stop/db2start, before the changes are picked up.

After monitoring is enabled, either at the connection or the database level, the collected values can be reset. This can be useful if you are wanting to pull performance metrics on a specific activity within the system. Resetting the database monitors always happens at a connection level, regardless if the monitoring was enabled at the connection or database level. For example, consider the case where database-level monitoring is enabled and the monitors are reset. Any subsequent snapshots for that connection would show only the metrics gathered since the reset but other connections to the database will still see all of the metrics gathered since the database was restarted. The only way to reset the metrics at the database level is to stop and restart DB2.

Monitor switches can be reset either for a specific database (if there is more than one database in the instance) or for all databases at once. If there is only one database in the instance, the two are equivalent:

# Individual database
db2 reset monitor for database database_name

# All databases
db2 reset monitor for all

Collecting snapshots

To pull the monitoring data out of the database after it has been enabled, use snapshots. Snapshots extract monitoring data at a point in time into a text file. Snapshots can be pulled for the following components:

  • applications
  • buffer pools
  • database
  • dynamic SQL
  • locks
  • tables
  • table spaces

You can collect snapshots on these components individually or all at once using the following commands as the database administrator:

# Individually
db2 get snapshot for applications on database_name
db2 get snapshot for database on database_name
db2 get snapshot for dynamic sql on database_name
db2 get snapshot for bufferpools on database_name
db2 get snapshot for locks on database_name
db2 get snapshot for tables on database_name
db2 get snapshot for tablespaces on database_name

# All at once
db2 get snapshot for all on database_name

Usually the output of these commands are redirected to a file for processing. You can then use a script or viewer to read and analyze the data.

Here is the top section of a database snapshot to give you an idea of what one looks like:

              Database Snapshot

Database name                              = LDAPDB2
Database path                              = /home/ldapdb2db/ldapdb2/NODE0000/SQL00001/
Input database alias                       = LDAPDB2
Database status                            = Active
Catalog database partition number          = 0
Catalog network node name                  =
Operating system running at database server= AIX 64BIT
Location of the database                   = Local
First database connect timestamp           = 04/24/2008 19:02:16.542927
Last reset timestamp                       =
Last backup timestamp                      = 04/25/2008 01:45:01.000000
Snapshot timestamp                         = 04/25/2008 16:12:25.914501

The snapshot includes some useful information about the monitoring itself such as when the first connection to the database occurred (effectively when monitoring began if database-level monitoring is enabled), the time the monitoring was last reset, and when the snapshot was taken. This lets us know that the metrics included in the snapshot was taken between the time of the last reset (or first database connect if empty) and the time the snapshot was taken.

Snapshot analysis

Each snapshot provides a different set of information that can help in identifying performance problems. This section digs deeper into the snapshots and how they can be useful.

All scripts mentioned in this section can be found within the IBM Tivoli Directory Server performance scripts under Download.

Database snapshots

A database snapshot provides some generally useful information about the state of the database. Of particular interest are such things like:

  • buffer pool aggregate summary information
  • number of transactions
  • number of current and total connections
  • number of locks held currently
  • number of deadlocks detected
  • number of lock escalations
  • number of sorts and sort time
  • number of package cache inserts, lookups, and overflows

Most of the data of interest provided by the database snapshot is easily accessible with a text editor. To make locating possible trouble areas easier a script, perfanalyze_database.pl, was created to process a database snapshot. The script also does some further analysis on the aggregate buffer pool summary and makes some buffer pool sizing recommendations, if necessary.

perfanalyze_database.pl example
$ perfanalyze_database.pl -i database.snapshot
Bufferpools:
  Bufferpool: ALL
    Hit ratios (the higher the better)
      Data:   53.453%
      Index:  99.918%
      Total:  99.678%
    Sync write ratios (the lower the better)
      Data:    2.542% (     6 syncs,    230 asyncs)
      Index:   1.873% (     5 syncs,    262 asyncs)
      Total:   2.187% (    11 syncs,    492 asyncs)
    Direct reads:  46
    Direct writes: 0
    Database files closed: 482
    NOTICE: Database files closed should be = 0
            Increase MAXFILOPs until this counter stops growing.
            see http://www.db2mag.com/db_area/archives/2001/q1/hayes.shtml

Connections:
  Total: 29
  Current: 25
  High Water Mark: 32

Transactions:
  Total: 313

Sorts:
  Total: 3
  Average Time per Sort (ms): 0
  Average Sorts per Transaction: 0.00958466453674121
  Percentage Overflows: 0%

Locks:
  Current: 36
  Total: 0
  Average Time per Lock (ms): 0
  Average Lock per Transaction: 0
  Deadlocks: 0
  Escalations: 0

Package cache:
  Lookups: 7334
  Inserts: 5
  Overflows: 0

In the example above, the script is recommending that we increase the MAXFILOP database parameter to reduce the number of files DB2 is opening and closing. The number of sorts reported in the system is low, only 3. If this number was very large and the average sorts per transaction was very high, you should confirm that the calling application is not using server-side sorting unnecessarily as this can impact performance. These would show up in the dynamic SQL snapshot processing which we'll discuss later.

Note that the buffer pool output is reporting the aggregate information for all buffer pools in the database. More precise buffer pool recommendations can be seen by processing a buffer pool snapshot, which we'll cover next.

Buffer pool snapshots

Because the logic for processing buffer pool snapshots closely mirrors that for processing the buffer pool aggregate information in a database snapshot, both snapshots are processed by the perfanalyze_database.pl script.

perfanalyze_database.pl example for buffer pool snapshots
$ perfanalyze_database.pl -i bufferpool.snapshot
  Bufferpool: IBMDEFAULTBP
    Current size: 366211 pages
    Hit ratios (the higher the better)
      Data:   53.595%
      Index:  99.919%
      Total:  99.686%
    Sync write ratios (the lower the better)
      Data:    2.632% (     6 syncs,    222 asyncs)
      Index:   1.880% (     5 syncs,    261 asyncs)
      Total:   2.227% (    11 syncs,    483 asyncs)
    Direct reads:  46
    Direct writes: 0
    Database files closed: 475
    NOTICE: Database files closed should be = 0
            Increase MAXFILOPs until this counter stops growing.
            see http://www.db2mag.com/db_area/archives/2001/q1/hayes.shtml

  Bufferpool: LDAPBP
    Current size: 15259 pages
    Hit ratios (the higher the better)
      Data:   48.675%
      Index:  99.654%
      Total:  95.748%
    Sync write ratios (the lower the better)
      Data:    0.000% (     0 syncs,      8 asyncs)
      Index:   0.000% (     0 syncs,      1 asyncs)
      Total:   0.000% (     0 syncs,      9 asyncs)
    Direct reads:  0
    Direct writes: 0
    Database files closed: 7
    NOTICE: Database files closed should be = 0
            Increase MAXFILOPs until this counter stops growing.
            see http://www.db2mag.com/db_area/archives/2001/q1/hayes.shtml

Automatic buffer pool sizing

DB2 V9 and later can automatically size buffer pools to ensure a good buffer pool hit ratio using the Self-Tuning Memory Manager. This takes another level of guess-work away from the administrator and lets the software tune itself. More information can be found in the DB2 Performance Guide.

idsperftune

The idsperftune utility included with TDS can assist with tuning the buffer pool and ibmslapd cache sizes. See the TDS product documentation for more information on the idsperftune tool.

The buffer pool snapshot analysis is most useful for determining if the buffer pools are sized adequately. The primary metric in evaluating this goal is the total buffer pool hit ratio. This metric highlights how frequently DB2 was able to find what it was looking for in a buffer pool rather than loading the data from the disk. The higher the buffer pool hit ratio, the fewer trips to the disk are necessary and the better the performance due to reduced I/O. If the total buffer pool hit ratio is below 90%, the script will recommend that you increase the size of the buffer pools.

While a good metric to use in improving DB2 performance, a high buffer pool hit ratio doesn't determine if DB2 is performing efficiently. With a large enough buffer pool, DB2 may be able to cache an entire table in memory and thus provide decent performance against that table despite there being a missing index. Adding the missing index would further improve performance and remove the need to cache the entire table. Don't assume that a high buffer pool hit ratio means that the system is tuned optimally.

Dynamic SQL snapshots

Dynamic SQL snapshots are one of the most useful pieces of information when tuning TDS. This snapshot provides information about the queries performed against the database and how long they took to execute. Because TDS uses parameter markers in the SQL there is not a one-to-one correlation between the LDAP query sent to TDS and the query from TDS to DB2. This results in the dynamic SQL snapshot having a single SQL query for many LDAP queries. Each SQL query will include the total time it took to complete and the number of executions. For a better picture of which queries are taking longer to complete than others, it is necessary to estimate a per-query execution time.

Here's a sample query from a dynamic SQL snapshot:

 Number of executions               = 319
 Number of compilations             = 1
 Worst preparation time (ms)        = 13
 Best preparation time (ms)         = 13
 Internal rows deleted              = 0
 Internal rows inserted             = 0
 Rows read                          = 0
 Internal rows updated              = 0
 Rows written                       = 0
 Statement sorts                    = 0
 Statement sort overflows           = 0
 Total sort time                    = 0
 Buffer pool data logical reads     = 0
 Buffer pool data physical reads    = 0
 Buffer pool temporary data logical reads   = 0
 Buffer pool temporary data physical reads  = 0
 Buffer pool index logical reads    = 9623904
 Buffer pool index physical reads   = 0
 Buffer pool temporary index logical reads  = 0
 Buffer pool temporary index physical reads = 0
 Total execution time (sec.ms)      = 264.341145
 Total user cpu time (sec.ms)       = 263.680000
 Total system cpu time (sec.ms)     = 0.220000
 Statement text                     = SELECT distinct E.EID FROM LDAPDB2.LDAP_ENTRY AS E,
LDAPDB2.LDAP_ENTRY as pchild WHERE E.EID=pchild.EID AND pchild.PEID=? AND E.EID IN ((SELE
CT EID FROM LDAPDB2.ERPARENT WHERE ERPARENT = ?) INTERSECT (SELECT EID FROM LDAPDB2.ERJAV
ASCRIPT) INTERSECT (SELECT EID FROM LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?))  FOR FETC
H ONLY

This query took 264 seconds to execute 319 times, or 0.828 seconds per execution. To avoid doing this calculation for each query in the snapshot, the perfanalyze_dynamicsql.pl script will do it for you.

perfanalyze_dynamicsql.pl example
$ perfanalyze_dynamicsql.pl -i dynamicsql.snapshot -t0

Dynamic SQL analysis for database: IDSDB2
Total number of statements processed: 687
Legend
   secPerExec - seconds per execution
   numExec - number of executions
   rowsW - number of rows written per execution
   statement - statement executed

Skipping system statements, override this with the -a option
Skipping statements that executed in less than 0.1 second...
secPerExec   numExec  statement
    0.1038         1  DELETE FROM LDAPDB2.IBMENTRYUUID WHERE EID = ?
    0.1355         1  DELETE FROM LDAPDB2.OBJECTCLASS WHERE EID = ?
    0.2775         1  DELETE FROM LDAPDB2.LDAP_DESC WHERE DEID = ?
    0.8287       319  SELECT distinct E.EID FROM LDAPDB2.LDAP_ENTRY AS E, LDAPDB2.LDAP_EN
TRY as pchild WHERE E.EID=pchild.EID AND pchild.PEID=? AND E.EID IN  ((SELECT EID FROM LD
APDB2.ERPARENT WHERE ERPARENT = ?) INTERSECT (SELECT EID FROM LDAPDB2.ERJAVASCRIPT) INTER
SECT (SELECT EID FROM LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?))  FOR FETCH ONLY
   19.0689         2  SELECT distinct D.DEID FROM LDAPDB2.LDAP_DESC AS D WHERE D.AEID=? A
ND D.DEID IN  ((SELECT EID FROM LDAPDB2.TS WHERE TS = ?) INTERSECT ((SELECT EID FROM LDAP
DB2.ROLES WHERE ROLES = ?) UNION (SELECT EID FROM LDAPDB2.ROLES WHERE ROLES = ?)) INTERSE
CT (SELECT EID FROM LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?) INTERSECT (SELECT EID FROM
 LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?))  FOR FETCH ONLY

If run without parameters, perfanalyze_dynamicsql.pl will show all queries that took more than 0.1 seconds to execute and will truncate the SQL at 80 characters. The -t parameter can be used to control the truncation. In the example above the truncation was disabled by setting it to zero. The -c parameter can be used to limit which queries are shown by setting a cutoff threshold in seconds. The default threshold is 0.1 seconds and any queries that complete faster than that are not shown. To show all queries use -c0.

In the above example one query is taking 19 seconds to complete on average. More investigation will be needed on this query to determine why that is. We'll tackle that point later.

Table space snapshots

The table space snapshot can provide some information about the configuration and current state of each table space in the system. The perfanalyze_tablespaces.pl script can be used to identify common problem areas for table spaces such as a prefetch/extent size mismatch or best practices on automatic DMS resizing and file system caching.

Here's a sample of the perfanalyze_tablespace.pl output for just the TDS tablespaces from a non-standard installation:

$ perfanalyze_tablespaces.pl -i tablespace.snapshot
Tablespace: LDAPSPACE (id: 4)

        Tablespace Type: Database managed space
    Auto-resize enabled: No
       NOTICE: You may want to enable auto-resize for this tablespace
               to reduce administrative overhead although care should be
               taken to ensure the file system doesn't run out of disk space.
    File system caching: Yes
       NOTICE: You may want to disable file system caching for this tablespace
               if your bufferpool hit ratio is very high to take advantage of
               Direct I/O (DIO) or Concurrent I/O (CIO) on supported platforms.

     Total pages: 800000
      Free pages: 249440 (31.180%)

     Extent size (pages): 32
   Prefetch size (pages): 576 (18 extents)

Tablespace: USERSPACE1 (id: 2)

        Tablespace Type: System managed space
    File system caching: Yes
       NOTICE: You may want to disable file system caching for this tablespace
               if your bufferpool hit ratio is very high to take advantage of
               Direct I/O (DIO) or Concurrent I/O (CIO) on supported platforms.

     Extent size (pages): 256
   Prefetch size (pages): 256 (1 extents)

Putting it together: analyzing a problem

So far the information discussed has been about gathering information and doing some rough analysis on it. Lets put it all together and show how to troubleshoot a TDS problem at the database level.

Finding slow queries

The first place to start when troubleshooting a TDS problem from the DB2 side is to find slow queries. Much like the perfanalyze_audit.pl script does for the audit log (see Part 1) the perfanalyze_dynamicsql.pl script is the place to start in finding slow DB2 queries. Lets revisit the dynamic SQL analysis discussed previously:

$ perfanalyze_dynamicsql.pl -i dynamicsql.snapshot -t0

Dynamic SQL analysis for database: IDSDB2
Total number of statements processed: 687
Legend
   secPerExec - seconds per execution
   numExec - number of executions
   rowsW - number of rows written per execution
   statement - statement executed

Skipping system statements, override this with the -a option
Skipping statements that executed in less than 0.1 second...
secPerExec   numExec  statement
    0.1038         1  DELETE FROM LDAPDB2.IBMENTRYUUID WHERE EID = ?
    0.1355         1  DELETE FROM LDAPDB2.OBJECTCLASS WHERE EID = ?
    0.2775         1  DELETE FROM LDAPDB2.LDAP_DESC WHERE DEID = ?
    0.8287       319  SELECT distinct E.EID FROM LDAPDB2.LDAP_ENTRY AS E, LDAPDB2.LDAP_EN
TRY as pchild WHERE E.EID=pchild.EID AND pchild.PEID=? AND E.EID IN  ((SELECT EID FROM LD
APDB2.ERPARENT WHERE ERPARENT = ?) INTERSECT (SELECT EID FROM LDAPDB2.ERJAVASCRIPT) INTER
SECT (SELECT EID FROM LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?))  FOR FETCH ONLY
   19.0689         2  SELECT distinct D.DEID FROM LDAPDB2.LDAP_DESC AS D WHERE D.AEID=? A
ND D.DEID IN  ((SELECT EID FROM LDAPDB2.TS WHERE TS = ?) INTERSECT ((SELECT EID FROM LDAP
DB2.ROLES WHERE ROLES = ?) UNION (SELECT EID FROM LDAPDB2.ROLES WHERE ROLES = ?)) INTERSE
CT (SELECT EID FROM LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?) INTERSECT (SELECT EID FROM
 LDAPDB2.OBJECTCLASS WHERE OBJECTCLASS = ?))  FOR FETCH ONLY

The place to start digging is the query that takes the longest, or the 19-second query in this case. It is worth noting that although the query took 19 seconds to complete, it was only ever seen twice. If the performance problem being investigated (you must be investigating some specific issue or you wouldn't be digging in the DB2 snapshots) is thought to be related, then continue with the analysis. If, however, it doesn't appear related to your problem skip it and look elsewhere. There's no point in spending hours optimizing a query that is executed very infrequently and isn't manifesting itself as a problem somewhere.

Identifying missing indexes

Long query times are often the result of missing indexes. Finding missing indexes involves doing a comparison between all attributes that TDS knows are indexed via its schema and the columns being queried against by DB2. To do this we use the perfanalyze_indexes.pl script introduced in Part 1. It not only works with TDS audit.log files but tries its best with dynamic SQL snapshots as well.

perfanalyze_indexes.pl example for dynamic SQL snapshots
$ perfanalyze_indexes.pl -i dynamicsql.snapshot -r ldapserver -u cn=root -p password

reading schema from ldapserver, this may take a while....
processing input file
Printing only unindexed attributes
Notes when using Dynamic SQL snapshots:
  1) 'Seen' may not be completely accurate
  2) Names in ALL CAPS were pulled from the snapshot and may not be actual LDAP attributes

  Attribute                 Seen  Index status
----------------------------------------------------------------------
- ts                           2  NOT Indexed
- erjavascript               319  NOT Indexed
- role                       319  NOT Indexed

It appears that several of the attributes used in the long-running queries are unindexed. The perfanalyze_indexes.pl script can generate an LDIF for use in indexing these attributes using the -l parameter. See Part 1 for more detailed information on how to use the perfanalyze_indexes.pl script.

This is a good time to discuss an important point: let TDS manage the attribute indexes, don't create them directly within DB2. While it is true that updating the TDS schema to index an attribute will, in the end, result in a new DB2 index, it is important to let TDS manage the indexes. Bypassing TDS and creating attribute indexes in DB2 directly can cause several problems:

  1. Creating indexes directly in DB2 will result in the TDS schema and the underlying database being out of sync. This may not show up until subsequent schema updates and it can lead to problems in future data migrations as often the LDAP schema is migrated and relied on to accurately reflect the state of the underlying database. If the indexes aren't in in the TDS schema, they will not be created resulting in poor performance.
  2. Depending on the attribute type and size, TDS may create more than just one index. If TDS determines that the attribute may be searched against using wildcard prefix searches (ie: attribute=*value) it may create an index not just on the ATTRIBUTE column but also on RATTRIBUTE as well. Similarly if the attribute is large enough, TDS will create and manage the index against the truncated attribute column (ATTRIBUTE_TRUNC) instead.
  3. Indexing done through TDS schema updates will be propagated to other TDS nodes in the replication topology if schema replication has been configured. This simplifies administration as the updates are pushed out to the other servers automatically ensuring index consistency between all peer nodes.

Generating explain plans

If no attributes of a slow-performing query show up as being unindexed, its time to determine how DB2 is fulfilling the query. This is done by generating an explain plan which will show the indexes and tables DB2 will use to satisfy the SQL query and in what order. Generating an explain plan requires that the explain tables be set up in the database. This is a one-time activity and is done by connecting to the database the problem query is running against and loading the EXPLAIN.DDL file located in the db2_instance_home/sqllib/misc directory on Unix systems and db2_install_directory/misc directory on Windows systems. The file is loaded with the command db2 -tvf explain_ddl_path/EXPLAIN.DDL.

After the explain tables have been loaded, we need to create the query file. This file (query.sql in our examples) includes a single SQL we want to explain. The query should be all on one line (the following example is wrapped):

SELECT DISTINCT D.DEID FROM LDAPDB2.LDAP_DESC AS D WHERE D.AEID=? AND D.DEID IN ((SELECT
EID FROM LDAPDB2.CN WHERE CN_T = ?) INTERSECT (SELECT EID FROM LDAPDB2.SN WHERE SN = ?))
FOR FETCH ONLY

We create an explain plan with the following commands:

db2 connect to database_name
db2 set current explain mode explain
db2 -f query.sql
db2 commit work
db2 connect reset
db2 terminate
db2exfmt -d database_name -g TIC -e instance_name -n % -s % \
  -w -1 -# 0 -o query.sql.exfmt

The query.sql.exfmt file will contain the explain plan. To facilitate generating explain plans, the explainSQL.sh and explainSQL.bat scripts included with the TDS tuning scripts do all of the above for you.

The following is the Access Plan from an explain of the above query:

Access Plan:
-----------
        Total Cost:             90.1134
        Query Degree:           1

                       Rows
                      RETURN
                      (   1)
                       Cost
                        I/O
                        |
                    1.0833e-11
                      UNIQUE
                      (   2)
                      90.1134
                         7
                        |
                    1.0833e-11
                      NLJOIN
                      (   3)
                      90.1133
                         7
                /--------+-------\
         8.1273e-07            1.33292e-05
           MSJOIN                IXSCAN
           (   4)                (   8)
           77.2173               38.6004
              6                     3
          /---+---\                |
     3.96673    2.04887e-07    2.01315e+07
     IXSCAN       FILTER     INDEX: LDAPDB2
     (   5)       (   6)           SN
     38.6055      38.6088
        3            3
       |            |
   2.01316e+07    5.45757
 INDEX: LDAPDB2   IXSCAN
       CN         (   7)
                  38.6088
                     3
                    |
                1.45373e+08
              INDEX: LDAPDB2
              LDAP_DESC_DEID

Based on this, first DB2 join rows from LDAP_DESC_DEID and CN indexes and then join the rows with the SN index. The number above each "INDEX: LDAPDB2" entry shows how many rows DB2 thinks are in that table based on the last runstats. In this case the LDAP_DESC_DEID index has ~100 million rows.

The LDAP_DESC table is used to create the hierarchical tree structure. Each LDAP entry has one row in this table for each ancestor in the tree all the way up to the suffix. If there are many objects under a branch of a tree, a subtree search against this tree can perform poorly. In the case above, if the values queried against CN and SN are selective, ie: few CN and SN records will match, DB2 would be better suited to first join the CN and SN index results and then the LDAP_DESC_DEID results. To force DB2 to do this we set the cardinality, or table size, for LDAP_DESC to 9E18. This changes the query plan to this:

Access Plan:
-----------
        Total Cost:             110.068
        Query Degree:           1

                       Rows
                      RETURN
                      (   1)
                       Cost
                        I/O
                        |
                    5.28733e-05
                      UNIQUE
                      (   2)
                      110.068
                      7.50666
                        |
                    5.28733e-05
                      NLJOIN
                      (   3)
                      110.068
                      7.50666
                /--------+-------\
         5.28733e-05             14249.4
           MSJOIN                IXSCAN
           (   4)                (   8)
           84.0942               58.4036
           6.50666                  3
          /---+---\                |
     3.96673    1.33292e-05       9e+18
     IXSCAN       FILTER     INDEX: LDAPDB2
     (   5)       (   6)     LDAP_DESC_DEID
     38.6055      45.4103
        3         3.50666
       |            |
   2.01316e+07    268.337
 INDEX: LDAPDB2   IXSCAN
       CN         (   7)
                  45.4103
                  3.50666
                    |
                2.01315e+07
              INDEX: LDAPDB2
                    SN

The above version will perform better for large subtree searches although it may perform slower for small subtree searches if LDAP_DESC will be more selective. Artificial cardinality tunings are a valuable, although large scope, tuning technique for TDS. TDS sets some artificial cardinalities on specific tables at start-up, as does the idsrunstats utility. If runstats is performed outside of the idsrunstats utility, including automatic runstats in DB2 v9 and later, it is important to take steps to ensure the artificial cardinalities are re-applied or poor performance my result. Note that other TDS clients, such as IBM Tivoli Access Manager and IBM Tivoli Identity Manager, include additional artificial cardinality tunings to ensure good performance for those products. Consult the product documentation and tuning guides for those products for more information.

Conclusion

Troubleshooting TDS performance at the database level is more involved than doing so at the audit log level and requires more experience with DB2. However, there are times it becomes necessary to tune the underlying DB2 datastore to ensure good performance and the results of further tuning can be well worth it.


Download

DescriptionNameSize
TDS audit scripts used in this articletds_perf_scripts.tar.gz38KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Tivoli (service management) on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Tivoli (service management), Tivoli, Security
ArticleID=336481
ArticleTitle=Troubleshooting IBM Tivoli Directory Server Performance, Part 2: Resolving slow queries using DB2 snapshots
publish-date=09152008