Troubleshooting IBM Tivoli Directory Server performance, Part 1: Resolving slow queries using the TDS audit log

IBM® Tivoli® Directory Server (TDS) is a key component of many corporate environments. Other IBM Tivoli products such as IBM Tivoli Access Manager for e-Business and IBM Tivoli Identity Manager use IBM Tivoli Directory Server as a key middleware component. This document discusses ways to identify and correct performance issues for TDS using the audit log.

Share:

Casey A Peel (cpeel@us.ibm.com), IBM Tivoli Identity Manager Performance Lead, IBM

Casey PeelCasey A. Peel is the performance lead for IBM Tivoli Identity Manager. Based in Denver, Colorado, he has been working with the performance of various Tivoli Security products since joining IBM in 2000 -- including IBM Tivoli Access Manager (TAM), IBM Tivoli Identity Manager (TIM), IBM Tivoli Directory Server (TDS), and IBM Tivoli Directory Integrator (TDI). He has been directly involved with IBM Tivoli Identity Manager performance since version 4.4 and is the author of the ITIM Performance and Tuning Guide and related tuning scripts. He earned his B.S. in Computer Science from Texas A&M University.



07 July 2008

Performance tuning IBM Tivoli Directory Server

In this series, we're exploring techniques and tools to use when troubleshooting poor performance for IBM Tivoli Directory Server. In this article, we're taking a close look at the TDS audit log and tools available for identifying trouble spots.

The audit log

The first step when trying to determine why TDS is not performing well is to identify any slow queries via the audit log. The audit log is the primary logging facility for TDS. When a query comes in, audit entries log who requested it, what system the request originated from, and details of the request.

Lets take a closer look at a sample audit record. This record was generated using the following query: ldapsearch -D cn=root -w password -b "dc=com" -s one "objectclass=*" dn

Sample TDS audit entry (wrapped)
AuditV3--2008-02-29-17:55:55.068+00:00--V3 Search--bindDN: cn=root--
  client: 127.0.0.1:33250--connectionID: 13113--
  received: 2008-02-29-17:55:55.044+00:00--Success
controlType: 1.3.6.1.4.1.42.2.27.8.5.1
criticality: false100
base: dc=com
scope: singleLevel
derefAliases: neverDerefAliases
typesOnly: false
filter: (objectclass=*)
attributes: dn

This entry shows that the search was submitted at 2008-02-29-17:55:55.044 and completed at 2008-02-29-17:55:55.068. We can tell the query completed successfully because the AuditV3 line ended in Success and not Sizelimit Exceeded or some other status messages. Other items of interest are the bindDN (cn=root) and the client (127.0.0.1), which can be used to identify where the query is coming from. These are useful particularly useful to identify whether troublesome queries are coming from the same bind DN or machine. The base (dc=com) shows at what level the search originated. The scope (singleLevel) limits which entries are returned via the LDAP hierarchy. The filter (objectclass=*) specifies which objects we want to return. Every entry in an LDAP has one or more objectclass attributes, so objectclass=* says return all entries starting at the base given the desired scope.

TDS supports auditing of the following operations:

  • Bind
  • Un-bind
  • Add
  • Modify
  • Modify DN
  • Delete
  • Search
  • Extended operation

In addition, the server can be configured such that only failed operations, not all operations, are audited. Each audit option can be enabled independently allowing for fine-grained control over the information recorded.

Enabling the audit log can have a performance impact and it should only be enabled in production environments when trying to isolate problems, unless auditing has been enabled only for failed operations. When enabling monitoring in production environments or high-performing test environments, limit the auditing to only the desired operations to minimize the performance impact. In most cases auditing only search events is sufficient to detect poor-performing transactions although update operations (add, modify, delete) need to be audited for gathering information about replication.


Enabling and disabling auditing

The audit log can be enabled or disabled on-the-fly by doing an ldapmodify using an appropriate LDIF such as: ldapmodify -D cn=root -w password -f auditingControlFile.ldif where the contents of auditingControlFile.ldif controls which auditing to enable.

After monitoring is enabled, search filters are written to the audit.log file after they are complete. This is an important distinction. If a query occurs that takes several minutes to complete and auditing is disabled between the time the query is started and when it finishes, the audit record for the query will not be recorded. Similarly if TDS terminates before a query finishes, that query will not show up in the audit.log.

Here's a sample LDIF that shows all available auditing options. This file enables global auditing and search operation auditing for all events and disables all other operation auditing:

# enableAuditing.ldif
#
# set the appropriate attribute to TRUE to enable or FALSE to disable
#
dn: cn=Audit, cn=Log Management, cn=Configuration
changetype: modify
replace: ibm-audit
ibm-audit: TRUE
-
replace: ibm-auditBind
ibm-auditBind: FALSE
-
replace: ibm-auditUnBind
ibm-auditUnBind: FALSE
-
replace: ibm-auditAdd
ibm-auditAdd: FALSE
-
replace: ibm-auditModify
ibm-auditModify: FALSE
-
replace: ibm-auditModifyDN
ibm-auditModifyDN: FALSE
-
replace: ibm-auditDelete
ibm-auditDelete: FALSE
-
replace: ibm-auditSearch
ibm-auditSearch: TRUE
-
replace: ibm-auditExtOpEvent
ibm-auditExtOpEvent: FALSE
-
replace: ibm-auditFailedOpOnly
ibm-auditFailedOpOnly: FALSE

Auditing is enabled at a global level (by changing the ibm-audit attribute) and per-operation (by changing the ibm-auditOperation attributes). It is further modified by the ibm-auditFailedOpOnly attribute and only logs failed operations if this attribute is set to TRUE. It is important to remember these subtleties when modifying auditing levels. If you create an LDIF file that only sets ibm-audit to TRUE without changing any of the audit operation attributes, the operations that will get audited, if any, depends on the previous state of the operation attributes. If you are uncertain as to the previous state of the audit operation attributes, we recommend that you explicitly disable all operations that you do not want audited in addition to explicitly enabling the ones that you do want audited.

To enable global auditing for all update events in addition to search events, the following LDIF can be used:

# enableUpdateAuditing.ldif
#
# set the appropriate attribute to TRUE to enable or FALSE to disable
#
dn: cn=Audit, cn=Log Management, cn=Configuration
changetype: modify
replace: ibm-audit
ibm-audit: TRUE
-
replace: ibm-auditBind
ibm-auditBind: FALSE
-
replace: ibm-auditUnBind
ibm-auditUnBind: FALSE
-
replace: ibm-auditAdd
ibm-auditAdd: TRUE
-
replace: ibm-auditModify
ibm-auditModify: TRUE
-
replace: ibm-auditModifyDN
ibm-auditModifyDN: TRUE
-
replace: ibm-auditDelete
ibm-auditDelete: TRUE
-
replace: ibm-auditSearch
ibm-auditSearch: TRUE
-
replace: ibm-auditExtOpEvent
ibm-auditExtOpEvent: TRUE
-
replace: ibm-auditFailedOpOnly
ibm-auditFailedOpOnly: FALSE

To disable all auditing at the global level, use the following LDIF:

# disableAuditing.ldif
#
dn: cn=Audit, cn=Log Management, cn=Configuration
changetype: modify
replace: ibm-audit
ibm-audit: FALSE

To find out if auditing is enabled and, if so, which operations are being audited, you can do an ldapsearch against the auditing DN:

-bash-3.00$ ldapsearch -D cn=root -w password \
  -b "cn=Audit,cn=Log Management,cn=Configuration" -s base "objectclass=*"

cn=Audit, cn=Log Management, cn=Configuration
cn=Audit
ibm-audit=TRUE
ibm-auditAdd=FALSE
ibm-auditAttributesOnGroupEvalOp=FALSE
ibm-auditBind=FALSE
ibm-auditCompare=FALSE
ibm-auditDelete=FALSE
ibm-auditExtOp=FALSE
ibm-auditExtOPEvent=FALSE
ibm-auditFailedOPonly=FALSE
ibm-auditGroupsOnGroupControl=FALSE
ibm-auditModify=FALSE
ibm-auditModifyDN=FALSE
ibm-auditPerformance=FALSE
ibm-auditPTABindInfo=TRUE
ibm-auditSearch=TRUE
ibm-auditUnbind=FALSE
ibm-auditVersion=3
ibm-slapdLog=/export/home/db2inst1/idsslapd-db2inst1/logs/audit.log
objectclass=ibm-auditConfig
objectclass=ibm-slapdLogConfig
objectclass=ibm-slapdConfigEntry
objectclass=top

Finding slow queries

The core function of an LDAP server is to accept queries and provide answers to those queries. Therefore, isolating performance problems is a case of finding queries that are slow (for some value of slow) and trying to make them faster.

How fast does a query need to execute before it is considered slow? That depends entirely on the requirements of the requesting application and the frequency of the query. If a query occurs one time during an off-hours batch processing job that runs nightly and requires 10 seconds to complete, that might be satisfactory. If a query occurs often in a time-critical part of a Web-based application, even 0.5 seconds per query may be too slow. A general rule when looking for slow queries is to identify queries that take more than 1 second to return.

Lets take another look at our sample aduit event:

AuditV3--2008-02-29-17:55:55.068+00:00--V3 Search--bindDN: cn=root--
  client: 127.0.0.1:33250--connectionID: 13113--
  received: 2008-02-29-17:55:55.044+00:00--Success
controlType: 1.3.6.1.4.1.42.2.27.8.5.1
criticality: false100
base: dc=com
scope: singleLevel
derefAliases: neverDerefAliases
typesOnly: false
filter: (objectclass=*)
attributes: dn

Calculating how long this query took to execute is fairly straightforward. The time that the search was submitted is the timestamp after received:, or 2008-02-29-17:55:55.044. The time the search completed is the timestamp immediately after AuditV3--, or 2008-02-29-17:55:55.068. Subtracting the received time from the completed time yields the total duration: 55.068 - 55.044 = 0.024 seconds.

Doing this math for one or even several queries is tedious but possible. Very few audit logs consist of only a handful of queries so we turn, instead, to a script to do it for us.

The perfanalyze_audit.pl script is primarily used to calculate the execution time per query and show which queries are taking the longest to execute on average. Lets run the script against an audit.log from an IBM Tivoli Identity Manager system.

Default filter timings report, fuzzy filters
-bash-3.00$ perfanalyze_audit.pl -i audit.log 
processing ITDS log

Timing results:
  Total records: 4685
  Skipping queries that executed in less than 0.1 second...
  secPerExec numExec filter
     0.11775       4 (eroperationnames=)
     9.09786      14 (customercode=)
    25.65133       6 (&(&(sn=))(|(objectclass=)(objectclass=)))
    44.05600       6 (&(&(cn=))(|(objectclass=)(objectclass=)))
    45.94200       2 (&(cn=)(objectclass=))
   153.95400       3 (&(erparent=)(|(objectclass=)(objectclass=)))

Without additional parameters, perfanalyze_audit.pl does a filter timings report using fuzzy filters and only shows queries that took more than 0.1 seconds to execute on average. With fuzzy filters, filters are normalized to remove attribute values before being averaged. This is useful for identifying long-running queries that might be caused by unindexed attributes.

This report shows very long execution times (154 seconds) for the (&(erparent=)(|(objectclass=)(objectclass=))) filter. Often when this happens, one or more attributes in the query aren't indexed. Before we start trying to locate missing indexes, let's see if any control types were used on these queries.

-bash-3.00$ perfanalyze_audit.pl -i audit.log -l
processing ITDS log

Timing results:
  Total records: 4690
  Skipping queries that executed in less than 0.1 second...
  sss = server-side sorts
  paging = paging results
  secPerExec numExec sss paging filter
     0.11775       4            (eroperationnames=)
     9.09786      14            (customercode=)
    25.65133       6            (&(&(sn=))(|(objectclass=)(objectclass=)))
    44.05600       6            (&(&(cn=))(|(objectclass=)(objectclass=)))
    45.94200       2          1 (&(cn=)(objectclass=))
   153.95400       3   1      1 (&(erparent=)(|(objectclass=)(objectclass=)))

Showing the control types is often useful as some of them, such as server-side sorting, can have negative performance impacts. In this case, the 154-second query is using server-side sorting. We'll want to check to see if that is necessary for the calling application and, if not, have the application changed, so that it doesn't use it.

The perfanalyze_audit.pl script has many other parameters and modes, see the perfanalyze_audit.pl section for a detailed description.

Now let's see if any of the attributes are missing indexes.


Finding missing indexes

Missing indexes are the most common performance problem but are also the easiest to identify and the easiest to correct. When indexes are added to attributes through TDS, TDS creates one or more indexes in the underlying DB2® database. When a search is done on an attribute without an index, DB2 must scan through all instances of that attribute to find the ones matching the search criteria. While this can often be done quickly for attributes with few instances, the more instances that exist, the longer the search takes.

TDS indexes are maintained in the schema files under the $TDS_INSTANCE_HOME/etc directory. Indexes on custom attributes are stored in the V3.modifiedschema. Indexes on built-in attributes are stored in the V3.*.at files. In addition to the standard attributeType entry, attributes with indexes have an IBMattributeType entry specifying what kind of index it is (EQUALITY, SUBSTR, and so on). While it is possible to slog through the schema files to determine if a given attribute is indexed, it is not generally recommended.

TDS 6.0 and later helpfully writes messages to the ibmslapd.log file when searches are done against unindexed attributes. Viewing the ibmslapd.log file for these messages is an easy way to find non-indexed attributes that are often searched-on.

Example lines from ibmslapd.log
11/19/07 13:49:26 GLPRDB091W Attribute eroperationnames is not indexed but is used
in searches 10 times.
11/19/07 19:08:50 GLPRDB091W Attribute eroperationnames is not indexed but is used
in searches 100 times.
11/22/07 19:49:57 GLPRDB091W Attribute eroperationnames is not indexed but is used
in searches 1000 times.

Another, more direct, way of finding searched-on attributes that do not have indexes is to use the perfanalyze_indexes.pl script. This script does the work of slogging through the schema files to ascertain if a specific attribute is unindexed. Let's run it against the same audit.log file used above that had the 154-second query.

perfanalyze_indexes.pl output
-bash-3.00$ perfanalyze_indexes.pl -i audit.log -d /home/db2inst1/idsslapd-db2inst1/etc/ \
  -l attribute_indexes.ldif
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.system.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.ibm.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.user.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.config.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.system.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.ibm.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.user.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.config.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.modifiedschema...
processing input file....
Printing only unindexed attributes

  Attribute                 Seen  Index status
----------------------------------------------------------------------
- customercode                14  NOT Indexed
- eroperationnames             4  NOT Indexed

LDIF file attribute_indexes.ldif created to index unindexed attributes

In this case, the customercode and eroperationnames attributes were searched on in the audit.log but not indexed in the schema. This is likely the cause of the poor performance of the two queries using these attributes. The attribute_indexes.ldif file will contain the necessary schema modifications to index the two unindexed attributes via ldapmodify: ldapmodify -h hostname -D cn=root -w password -f attribute_indexes.ldif

It is worth mentioning that not every searched-upon attribute needs an index. If the attribute is not searched on frequently, it probably doesn't need an index even if that one search takes longer than desired. If, on the other hand, an attribute is searched on frequently, it should certainly have an index on it. The rationale is simple: Indexes are not free. While indexes improve searching they do have an overhead cost for update operations.

The perfanalyze_indexes.pl script has many other parameters, see the perfanalyze_indexes.pl section for a detailed description.

A lot more information can be gleaned from the audit logs beyond what has been discussed so far. The next section discusses the IBM Tivoli Directory Server performance scripts and how to use them to pull more useful data out of the audit log.


Analysis scripts

The IBM Tivoli Directory Server performance scripts (see Download) contain several scripts for extracting useful information out of the audit log into a human friendly format. The scripts are written in perl and will work on any platform with perl installed.

perfanalyze_audit.pl

The perfanalyze_audit.pl script is capable of showing many different perspectives on audit.log data. The primary use is to calculate the execution time per query and show which queries are taking the longest to execute on average. In addition, it can show search distribution frequency, transaction summary, and time interval statistics. Lets run the script against an audit.log from an IBM Tivoli Identity Manager system.

Default filter timings report, fuzzy filters
-bash-3.00$ perfanalyze_audit.pl -i audit.log 
processing ITDS log

Timing results:
  Total records: 4685
  Skipping queries that executed in less than 0.1 second...
  secPerExec numExec filter
     0.11775       4 (eroperationnames=)
     9.09786      14 (customercode=)
    25.65133       6 (&(&(sn=))(|(objectclass=)(objectclass=)))
    44.05600       6 (&(&(cn=))(|(objectclass=)(objectclass=)))
    45.94200       2 (&(cn=)(objectclass=))
   153.95400       3 (&(erparent=)(|(objectclass=)(objectclass=)))

Without additional parameters, perfanalyze_audit.pl does a filter timings report using fuzzy filters and only shows queries that took more than 0.1 seconds to execute on average. With fuzzy filters, filters are normalized to remove any attribute values before being averaged. This is useful for identifying long-running queries that may be caused by unindexed attributes.

The perfanalyze_audit.pl script accepts many other parameters as well, a list of which can be obtained by using the -h parameter:

perfanalyze_audit.pl usage
-bash-3.00$ perfanalyze_audit.pl -h

Usage: ./perfanalyze_audit.pl [ -i inputFile ] [ -o outputFile ] [ -f filterMethod ]
[ -t [ -c cutOff ] | -gsdbplav | -m timeFrame ] 
Filter options:
  -f - filter method, the following options are valid:
           all - all filters, don't collect similar filters
         fuzzy - use fuzzy filters (that is, no attribute values), default
          full - use full filters

Sort options:
  -r - sort options for use with filter timings report (-t)
          secPerExec - seconds per execution, default
             numExec - number of executions
              stdDev - standard deviation
             aggTime - aggregate time of searches

Output options
  -t - show search filter timings
  -d - show search distribution timings
  -s - show transaction summary
  -g - show search filter frequencies
  -m - show time-interval stats, timeFrame is one of: second, minute, hour, day, month
  -c - statements longer than this time are not included in timings report, default is 0.1
  -v - show standard deviation
  -b - show search bases
  -p - show search scopes
  -l - show control types (like server-side sorts)
  -a - show aggregate timings

Other options:
  -h - displays this help information
  -i - file containing log for processing
  -o - file to put the processed results in, default is STDOUT

If no inputFile is given, the program will read input from STDIN.

The "full" filter method, using the -f full parameter, shows all attribute values:

-bash-3.00$ perfanalyze_audit.pl -i audit.log -f full
processing ITDS log

Timing results:
  Total records: 4685
  Skipping queries that executed in less than 0.1 second...
  secPerExec numExec filter
     0.29500       1 (eroperationnames=CHANGEPASSWORD)
     9.09786      14 (customercode=1556)
    38.16200       1 (&(&(sn=INSIDER*))(|(objectclass=ERPERSONITEM)(objectclass=ERBPPERSON
ITEM)))
    38.55933       3 (&(&(sn=JONES*))(|(objectclass=ERPERSONITEM)(objectclass=ERBPPERSONIT
EM)))
    41.63200       1 (&(cn=*JONES*)(objectclass=ERPERSONITEM))
    50.25200       1 (&(cn=*SMITH*)(objectclass=ERPERSONITEM))
   132.13250       2 (&(&(cn=IVUSER1*)(|(objectclass=INETORGPERSON)(objectclass=ORGANIZATI
ONALPERSON)))(|(objectclass=ERPERSONITEM)(objectclass=ERBPPERSONITEM)))
   153.95400       3 (&(erparent=ERGLOBALID=257739109411056098,OU=ORGCHART,ERGLOBALID=0000
0000000000000000,OU=ACME,DC=COM)(|(objectclass=ERPERSONITEM)(objectclass=ERBPPERSONITEM)))

This report still averages the times but uses the full, or exact, filter when doing so. If you compare this run to the last one, you'll see that only two of the six (&(&(cn=))(|(objectclass=)(objectclass=))) entries from the first set appeared in the second set. The other entries are below the 0.1 second cutoff. Seeing both the fuzzy and the full output together can be useful to identify if all queries with the same filter are taking the same amount of time or if the timings differ based on the attribute value specified.

There is a third filter method in addition to fuzzy and full: All. All filters shows every filter without averaging. This mode can be useful if you want to get a list of all audit log records and how long they took to execute.

The filter timing report can show other additional information about the filters, such as the base and scope used and any control types used in the call. Showing control types, using the -l parameter, can be very useful for identifying queries that are using server-side sorting. This option often has a high performance impact and can be the cause of high CPU utilization.

perfanalyze_audit.pl filter report with control types
-bash-3.00$ perfanalyze_audit.pl -i audit.log -l
processing ITDS log

Timing results:
  Total records: 4690
  Skipping queries that executed in less than 0.1 second...
  sss = server-side sorts
  paging = paging results
  secPerExec numExec sss paging filter
     0.11775       4            (eroperationnames=)
     9.09786      14            (customercode=)
    25.65133       6            (&(&(sn=))(|(objectclass=)(objectclass=)))
    44.05600       6            (&(&(cn=))(|(objectclass=)(objectclass=)))
    45.94200       2          1 (&(cn=)(objectclass=))
   153.95400       3   1      1 (&(erparent=)(|(objectclass=)(objectclass=)))

The perfanalyze_audit.pl has several other reporting modes that can be used to view the audit log from different angles. The timing distribution report, using the -d parameter, shows what percentage of the searches fell into discrete time buckets. This can help identify what percentage of queries are taking too long to execute. If the filter timings report shows that one query is taking 10 seconds it might be something to investigate. If the timing distribution report shows that the 10-second query is only a very small percentage of the entire traffic, it might not be a concern.

Timing distribution report
-bash-3.00$ perfanalyze_audit.pl -i audit.log -d
processing ITDS log

Search Distribution:
  Total records: 4690
  time            count   percentage
  <=   0.001          0     0.00%
  <=   0.010       4566    97.36%
  <=   0.100         89     1.90%
  <=   1.000          4     0.09%
  <=   2.000          0     0.00%
  <=   5.000          0     0.00%
  <=  10.000         14     0.30%
  <= 100.000         14     0.30%
   > 100.000          1     0.30%

This report is useful for showing what percentage of queries executed in what amount of time. In this case, most (97%) of the queries are executing in less-than or equal-to 0.01 seconds, which is excellent.

The time interval report, using the -m interval parameter, shows which searches occurred over a given amount of time. This can be useful to determine when load is heaviest on the system throughout the day. The time interval can be: second, minute, hour, day, or month.

Time interval report
-bash-3.00$ perfanalyze_audit.pl -i audit.log -m day 
processing ITDS log

Time Interval Report (day):
  time interval              count  percent   
  2008-02-21                  3790   80.81%
  2008-02-26                   252    5.37%
  2008-02-27                   588   12.54%
  2008-02-28                    24    0.51%
  2008-02-29                    36    0.77%

This often-overlooked mode, when combined with the perfanalyze_audit_grep.pl script, is an excellent way of seeing when specific activities are occurring in the environment (more on that later).

perfanalyze_indexes.pl

This script accepts a variety of inputs, such as an audit log and the location of the TDS schema and determines which of the attributes used in the filters are not indexed.

perfanalyze_indexes.pl default output
-bash-3.00$ perfanalyze_indexes.pl -i audit.log -d /home/db2inst1/idsslapd-db2inst1/etc/ \
  -l attribute_indexes.ldif
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.system.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.ibm.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.user.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.config.at...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.system.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.ibm.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.user.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.config.oc...
processing schema file: /home/db2inst1/idsslapd-db2inst1/etc//V3.modifiedschema...
processing input file....
Printing only unindexed attributes

  Attribute                 Seen  Index status
----------------------------------------------------------------------
- customercode                14  NOT Indexed
- eroperationnames             4  NOT Indexed

LDIF file attribute_indexes.ldif created to index unindexed attributes

This example shows using an audit log and the location of the directory server's etc directory to determine which attributes are unindexed. In this case, the customercode and eroperationnames attributes were searched on in the audit.log but not indexed in the schema. This is likely the cause of the poor query performance as mentioned above in the first perfanalyze_audit.pl analysis. The attribute_indexes.ldif file contains the necessary schema modifications to index the two unindexed attributes via ldapmodify: ldapmodify -h hostname -D cn=root -w password -f attribute_indexes.ldif

The perfanalyze_indexes.pl script is very flexible in the input format, it accepts: audit logs, DB2 dynamic sql snapshots, output from perfanalyze_audit.pl, and even a file containing a list of attributes, one-per-line. In addition, it can read the schema from the local file system or read the schema directly from the LDAP server if the Net::LDAP perl module is installed.

Use the -h parameter to get a full usage statement.

Using the perfanalyze_audit.pl and the perfanalyze_indexes.pl scripts together is a powerful combination. For example, it is very likely that queries whose average query time is one second or longer should have all of their attributes indexed. The following command would check for that:

-bash-3.00$ perfanalyze_audit.pl -i audit.log -c 1 | \
  perfanalyze_indexes.pl -d /home/db2inst1/idsslapd-db2inst1/etc/ \
    -l attribute_indexes.ldif

Any attributes that were not indexed would show up and the attribute_indexes.ldif file would contain the necessary changes to index them. It is worth mentioning that not every searched-upon attribute needs an index. If the attribute is not searched on frequently, it probably doesn't need an index even if that one search takes longer than desired. If, on the other hand, an attribute is searched on frequently it should certainly have an index on it.

The perfanalyze_indexes.pl script can generate a custom LDIF (using -l filename.ldif) that can then be used via ldapmodify to index the unindexed attributes. After the LDIF is generated, it is highly suggested that you edit the file and remove stanzas for attributes that you do not want to index.

perfanalyze_audit_grep.pl

Often it is useful to pull out audit log stanzas matching some pattern into a separate file. Doing this by hand is tedious and sometimes virtually impossible with gigabyte-sized audit logs (they do exist). Unfortunately the standard UNIX® grep command won't work as it is line-based, not audit log stanza-based,l which is where perfanalyze_audit_grep.pl is used. It is designed to work like a very basic grep command and can match anything in an audit log stanza.

Some specific uses are to pull out all audit records for a specific bind DN or client ID. This is nice to see if it is a specific application (if each application has a separate bind DN) or machine (via client ID) is the source of poorly-performing queries that may be negatively impacting overall system performance. The following example would pull out all the audit records for bind DN "cn=idiapp" into a separate file for further analysis:

-bash-3.00$ perfanalyze_audit_grep.pl -i audit.log "bindDN: cn=idiapp" > idiapp.log

It may also be useful to pull out all instances that do not match a specific string, for instance to pull out all stanzas that have a bind DN other than "cn=idiapp". For this, use the -v parameter:

-bash-3.00$ perfanalyze_audit_grep.pl -i audit.log -v "bindDN: cn=idiapp" > not_idiapp.log

The usage statement has suggestions of other ways to isolate data that you're looking for.

Usage statement
-bash-3.00$ perfanalyze_audit_grep.pl -h

Usage: ./perfanalyze_audit_grep.pl [ -i inputFile ] [ -o outputFile ] [ -h ]
[ -v | -n ] searchString

Search hints:
  * Bind DN: search on the bind DN using: "bindDN: <DN>"
  * Operations: search on operations using one of the following, followed by --:
      Search, Bind, Unbind, Add, Modify, Delete
    for example: Search--
  * Client IP: search on the client IP by using: "client: <ip:port>"
  * Connection: search on the connection ID by using: "connectionID: <num>"
  * Time: search on a time in the format: 2006-06-14-04:12:24.952-06:00DST
      Hint: to limit times to completed times, prefix with "--":
         --2006-06-14-04:12:24.952-06:00DST
      Hint: to limit to received times, prefix with "received:":
         received: 2006-06-14-04:12:24.952-06:00DST
  * Status: search on result status using one of the following:
      --Success, --No such object, --Sizelimit Exceeded

Search options:
  -n - case insensitive search (sorry, -i was already taken)
  -v - negative search, find stanzas that do not match the criteria

Other options:
  -h - displays this help information
  -i - file containing log for processing
  -o - file to put the processed results in, default is STDOUT

If no inputFile is given, the program will read input from STDIN.

The output of this script is another audit log, so it can (and often is) used as input for the other perfanalyze_audit scripts in creative ways. The following example is taken from a real-world experience. We were working to improve system performance for a specific customer but needed to determine exactly what the throughput was based entirely on the audit log. This particular transaction had multiple LDAP calls per transaction so we couldn't base the data on the number of audit records. One specific query was using a filter that could uniquely identify it as one part of a transaction. We used this to pull out all audit records with that filter and pipe it into perfanalyze_audit.pl using the time interval report set to minutes to get the application's throughput per minute:

-bash-3.00$ perfanalyze_audit_grep.pl -i audit.log '(&(eruid=' | \
  perfanalyze_audit.pl -m minute
processing audit log (invertSearch: 0, caseInsensitive: 0)
no input file specified (-i), reading input from STDIN
Hint: use the -h option to get the usage statement
processing ITDS log....

Time Interval Report (minute):
  time interval              count  percent   
  2008-02-28 20:30             188    0.45%
  2008-02-28 20:31             215    0.52%
  2008-02-28 20:32             215    0.52%
  2008-02-28 20:33             219    0.53%
  2008-02-28 20:34             212    0.51%
  2008-02-28 20:35             219    0.53%
  2008-02-28 20:36             213    0.51%
  2008-02-28 20:37             213    0.51%
  2008-02-28 20:38             214    0.51%
  2008-02-28 20:39             217    0.52%
  2008-02-28 20:40             217    0.52%
  2008-02-28 20:41             215    0.52%
  2008-02-28 20:42             216    0.52%
  2008-02-28 20:43             217    0.52%
  2008-02-28 20:44             212    0.51%
  2008-02-28 20:45             214    0.51%
[output truncated]

perfanalyze_audit_replay.pl

After doing performance tuning, such as indexing some attributes or increasing the buffer pool sizes, you will want to confirm that the performance of the slow queries actually improved. The preferred way to do this is to re-run the application traffic that cause the slow queries to show up. This might or might not be possible depending on the calling application. If that isn't possible, replaying the searches recorded in the audit log might suffice.

The perfanalyze_audit_replay.pl will accept an audit log as input and create a script that can be run to re-create the queries. Alternatively the perfanalyze_audit_replay.pl script can run the queries directly if the Net::LDAP perl module is installed.

To re-create the queries as closely as possible, the replay should bind with the same DN as the original query. This requires creating a bindFile that includes a list of bind DNs that will be encountered in the audit log and the password to bind as each user. If that isn't possible, the script will accept a single DN and password to bind with. If nothing is specified the script will bind anonymously.

The next two examples use the following bindFile:

cn=root:passw0rd
cn=admin:password1

This example will create a shell script that can be used to search against LDAP server ldapServer and on port 3389:

-bash-3.00$ perfanalyze_audit_replay.pl -i audit.log -r ldapServer -p 3389 \
  -b bindFile -s replayScript.sh

This example assumes the Net::LDAP module is installed and will connect to localhost on port 389 and replay the traffic:

-bash-3.00$ perfanalyze_audit_replay.pl -i audit.log -b bindFile

perfanalyze_audit_repldelay.pl

In a master-slave or peer-to-peer replication environment it is often useful to determine the replication delay between two servers. The perfanalyze_audit_repldelay.pl script attempts to calculate the delay between a modification operation on one server and the corresponding replication event on another server using the audit log.

In order to use, the script updates (add, update, delete) must be audited.

After enabling update auditing, perform some update operation on server 1 such that it will propagate to server 2 and collect the audit logs renaming them to easy identification, for example server1_audit.log and server2_audit.log , respectively. perfanalyze_audit_repldelay.pl will show the total number of update requests it was able to match between the two files and calculate the minimum, maximum, and average delay between when those requests were received on server 1 and when the matching requests were received on server 2. Note that this delay also includes the time it takes for server 1 to process the request before replicating it over to server 2.

perfanalyze_audit_repldelay.pl
-bash-3.00$ perfanalyze_audit_repldelay.pl -m server1_audit.log -r server2_audit.log
processing audit logs.
Replication times (in seconds):
   Num: 126 (update requests only)
   Min: 0.0999999046325684
   Max: 0.399999856948853
   Avg: 0.24999988079071

To obtain accurate results, the time and date on both machines must be in sync. The ideal way to do this is to use ntpd to maintain the time. If you do not have control over the machines you can use the -t option to provide the time skew between the machines. In addition both audit logs should start with approximately the same audit information, because the script tries to match up entries between the logs. The best way to do this is to disable all auditing, remove or rename the audit.log files on both machines, and then enable update auditing on both machines as close to the same time as possible.

perfanalyze_audit_replrequirement.pl

TDS server scale horizontally by adding new peer-masters and spreading the traffic across them. Some applications that make writes and then shortly after a subsequent read for the same data are unable to make use of this scale pattern if the replication delay is too high. The common scenario is that a write will come into server 1 and before the change can be propagated to server 2, server 2 gets a request for the changed data and will return stale data. To detect if an application load can make use of horizontal scalability we need to see how quickly an application makes a write and then does a subsequent read encompassing the changed data.

The perfanalyze_audit_replrequirements.pl script attempts to calculate the maximum replication delay between two servers to ensure stale data is not returned to the querying application. It does this by looking for updates and finding subsequent searches that would return the changed entry. Because of this, both update and search requests need to be enabled for auditing.

perfanalyze_audit_replrequirement.pl
-bash-3.00$ perfanalyze_audit_replrequirement.pl -i audit.log -s sub
processing audit log
Output format:
   delay secs   collision type    object
   2.000 secs - search: sub     - DN: cn=bob,dc=com - search base: dc=com

The script uses the search base and scope along with the update object DN to determine when a search would return an updated entry. This is not entirely accurate as a query might not return the updated entry if the filter did not select the entry. To try and mitigate this, the script supports three different modes for calculating when a collision occurs due to scope resolution: base, one, and sub. Base is most accurate as it will only flag a collision if a search explicitly returns an entry with a scope of base. One and sub are less accurate because they can flag more false positives due to not using search filters in the collision detection. Because of these limitations, the script results should only be used as a rough estimate for replication requirements as the script will likely overestimate the time required.


Download

DescriptionNameSize
TDS audit scripts used in this articletds_perf_scripts.tar.gz38KB

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Tivoli (service management) on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Tivoli, Sample IT projects
ArticleID=306857
ArticleTitle=Troubleshooting IBM Tivoli Directory Server performance, Part 1: Resolving slow queries using the TDS audit log
publish-date=07072008