Level: Intermediate Dave Bachmann (bachmann@us.ibm.com), Senior Software Engineer, IBM US Ramakrishna Gorthi (rjgorthi@in.ibm.com), Software Engineer, IBM Global Services, India Amit Bhate (abhate@in.ibm.com), Staff Software Engineer, IBM India
14 Sep 2007 IBM® Tivoli® Monitoring monitors and manages system and network applications on a variety of platforms and keeps track of the availability and performance of all parts of your enterprise. This article details how IBM Tivoli Monitoring can be used to monitor the performance of the IBM Tivoli Directory Server.
Introduction
Any successful deployment of a directory solution requires ongoing monitoring to ensure that system performance and availability meet an organization's goals. This article documents a procedure to monitor the Tivoli Directory Server using the Tivoli Monitoring Universal Agent. The article is based upon a set of best practices for monitoring Tivoli Directory Server and its component parts. A parameter list that has been formulated over years of research into successful deployments of the directory server is being monitored through this article. The article also provides insights into setting situations to catch problems that administrators should be concerned about.
This article focuses on monitoring Tivoli Directory Server version 5.2 and 6.0 servers, but it can be extended to other versions of Tivoli Directory Server by following the documented examples.
Some of the key questions that this article addresses, which are important to any Tivoli Directory Server deployment, are
- Is the server available?
- How busy is the server?
- Is the server about to run out of memory?
- Is the server backlogged?
- How responsive is the server?
- Is the database about to run out of space?
- Are there any errors being logged?
There are numerous ways of presenting data in the Tivoli Enterprise Portal Client (TEPC). One of the ways in which monitoring data is presented in the TEPC workspace is TDS monitoring statistics in Tivoli Enterprise Portal Client.
Figure 1: TDS monitoring statistics in Tivoli Enterprise Portal Client
The table below contains acronyms used throughout this article. These acronyms might or might not be the official names associated with the respective products.
| LDAP: | Lightweight Directory Access Protocol | | TDS: | IBM Tivoli Directory Server | | ITM: | IBM Tivoli Monitoring | | UA: | Universal Agent | | TEPC: | Tivoli Enterprise Portal Client | | TEMS: | Tivoli Enterprise Monitoring Server |
IBM Tivoli Directory Server
The IBM Tivoli Directory Server (TDS) implements the Internet Engineering Task Force (IETF) LDAP V3 specifications. It also includes functional and performance enhancements added by IBM.. TDS uses IBM DB2® as the backing store to provide per LDAP operation transaction integrity, high performance operations, and on-line backup and restore capability. The IBM Tivoli Directory Server interoperates with the IETF LDAP V3-based clients.
Figure 2 provides a high level overview of the various components of the directory server and how the clients interact with it.
Figure 2: Tivoli Directory Server
IBM Tivoli Monitoring
IBM Tivoli Monitoring monitors and manages system and network applications on a variety of platforms and keeps track of the availability and performance of all parts of your enterprise. IBM Tivoli Monitoring provides reports you can use to track trends and troubleshoot problems.
You can use IBM Tivoli Monitoring to perform the following tasks:
- Visualize real-time monitoring data from your environment.
- Monitor resources in your environment for certain conditions, such as high CPU or an unavailable application.
- Establish performance thresholds and raise alerts when thresholds are exceeded or values are matched.
- Trace the causes leading to an alert.
- Create and send commands to systems in your managed enterprise by means of the Take Action feature.
- Use integrated reporting to create comprehensive reports about system conditions.
- Monitor conditions of particular interest by defining custom queries using the attributes from an installed agent or from an ODBC-compliant data source.
Software pre-requisites
The article is written with the following software requirements:
- IBM Tivoli Monitoring version 6.1.0
- IBM Tivoli Universal Agent version 6.1.0
- IBM Tivoli Directory Server version 5.2 or 6.0
The installation information for these products is available through the links in the Resources section below.
This article covers the information on setting up the Universal Agent. For information about configuring and setting up the rest of the products, see the documents in the Resources section below.
Universal Agent essentials
Tivoli Enterprise Monitoring Agents are installed on the systems or subsystems whose applications and resources that you want to monitor. The agent collects monitoring data from the managed system and passes it to the monitoring server to which it is connected. The client gathers the current values of the attributes and produces reports formatted into tables and charts. It can also test the values against a threshold and display an alert icon when that threshold is exceeded or a value is matched. These tests are called "situations".
The IBM Tivoli Universal Agent is a generic agent of IBM Tivoli Monitoring. You can configure the IBM Tivoli Universal Agent to monitor any data you collect. You can view the data in real-time and historical workspaces on the Tivoli Enterprise Portal and manage with Tivoli Enterprise Portal monitoring situations and automation policies, the same as data from other Tivoli Enterprise Monitoring Agents.
The Universal Agent is thoroughly documented in the
IBM Tivoli Universal Agent User's Guide
, but a few details are worth mentioning here.
Starting and stopping the Universal Agent
You can start and stop the Universal Agent with the itmcmd command.
You can start the Universal Agent with:
$CANDLEHOME/bin/itmcmd agent start um
|
You can stop the Universal Agent with:
$CANDLEHOME/bin/itmcmd agent stop um
|
On Solaris, CANDLEHOME is typically set to /opt/IBM/ITM, if the default installation path is followed.
Another environment variable that needs to be set for the Universal Agent to work is the UAHOME variable. On Solaris, the UAHOME variable is set to /opt/IBM/ITM/sol286/um. These settings might change with the operating system and with the path of installation.
If you prefer the GUI for the settings, you can right-click on the Universal Agent in the Navigator view and choose Start, Stop or Restart.
Defining data in the Universal Agent
An IBM Tivoli Universal Agent application consists of one or more attribute groups, each group consisting of one or more attributes. The application to be monitored is defined in a data definition file, called a “metafile”, which is imported by the IBM Tivoli Universal Agent. The command to import the metafile is the um_console command.
The command to import the metafile is um_console.
The metafile is a plain text file. It contains the following control statements in the order shown (if present):
SNMP: For SNMP Data Providers only, introduces the data definition for IBM Tivoli Monitoring provided SNMP MIB applications. SNMP TEXT introduces the data definition for user-defined SNMP applications.
APPL: Specifies the name that IBM Tivoli Monitoring uses for the application.
NAME: Defines the name of an attribute group, the type of data being collected, and the period for which the data is valid.
INTERNAL: Provides for data redirection between attribute groups as a way to perform additional processing.
SOURCE: Defines the location of the data you are collecting.
RECORDSET: For File Data Providers only, defines the set of records from which the data provider extracts data.
CONFIRM: For Socket Data Providers only, specifies the requirements for data acknowledgment.
SQL: For ODBC Data Providers only, defines the Select statement or stored procedure to use for collecting relational data.
SUMMARY: Defines the requirements for gathering the frequency of data input during monitoring.
ATTRIBUTES: Introduces the attribute definitions and specifies the attribute delimiters in the data string. Below the ATTRIBUTES control statement, list the individual attribute definition statements.
Note: Ensure that the $UAHOME/work/KUMPCNFG file has a mention of the mdl file you have written. um_console would take care of this for you, but that's something you can cross check.
|
Choosing a data collection approach
The Universal Agent supports several different ways of providing data to the monitoring server, including log file parsing, script execution, and receiving data over a socket, all of which will be used in this article. The Universal Agent can also receive events via SNMP and API calls, as well as monitor Web servers via URL polling, and databases via ODBC; however, this article does not make use of those capabilities.
The best choice of data provider is going to be influenced by the source of the data and how you want to monitor it, that is: polled, continuously, or event-driven. Because data manipulation (for example, calculating response times, hit ratios, or percentages) must be done by the data provider before the TEMS receives it and passes it on to the portal, the choice of data provider becomes important.
When the data of interest is already in a log file and just needs to be parsed and provided to the monitoring server as it is written to the log file, the “log file adapter” is the most useful tool to use. If only simple calculations (add, subtract, divide or multiply integers) need to be performed on the data before sending it to the monitoring server, the calculations can be embedded in the metafile data descriptions and performed by the log file adapter.
If more advanced processing needs to be done (for example, calculating response times from timestamps), the best approach is the “socket provider”, which can receive data from a program or script that reads the log file, performs the processing, and writes the data to a socket connected to the Universal Agent.
When the data needs to be retrieved periodically from some utility (for example, using ldapsearch to fetch the cn=monitor statistics), then a shell script invoked by the script data provider is the best approach.
Describing the data to be monitored
All of the definitions developed in the rest of this paper will be contained in the metafile for the TDS application. The file is named tds.mdl, which starts with declaring the name of the application as shown below.
//APPL TDS @ Monitoring for Tivoli Directory Server
|
Note: Everything after the "@" is a comment
|
This declares the name of the application as TDS, which will show up in the Portal underneath the node name as an application named TDSnn, where the nn is the version number of the metafile, starting initially at 00 and incrementing each time you make significant changes to it. The
IBM Tivoli Universal Agent User's Guide
lists the types of metafile updates that result in major or minor version changes.
After the name of the application, the data description is provided to the Universal Agent. The data is described to the Universal agent in the format as shown below.
//NAME LDAP_Monitor P 90 AddTimeStamp @ LDAP stats from cn=monitor
//SOURCE SCRIPT TDSMon.sh envfile=TDS.env Interval=60
//INTERNAL OUTPUT LdapMonitor
//ATTRIBUTES ';'
|
In the above example, the data description starts with the //NAME line, naming the attribute group LDAP_Monitor, giving it a type of P (polled) with a 90 second lifetime, and have the Universal Agent add a timestamp each time the script is run. The //SOURCE line declares the source of the data is a script named TDSMon.sh, with environment variable set from the TDS.env file, to run every 60 seconds. The data would be saved to an internal buffer named LdapMonitor for further processing by another attribute group. Declare that the data is separated by semicolons.
After the data description shown above, the type of each attribute is specified. Here are some examples of the different attribute types:
Attribute types
| APP_Version | D | 50 |
| | TotalConnections | C | 99999999 |
| | ElapsedTimeSeconds | C | 99999999 | Scale{9} Precision{11} |
Display strings, such as the APP_version string, are declared with a “D” followed by the maximum length of the string. Integer values, such as the TotalConnections counter, are declared with a “C” followed by the maximum value of the integer. Floating point values, such as ElapsedTimeSeconds, can also have the Scale and Precision specified.
Important Tivoli Directory Server metrics
Some of the questions that a monitoring deployment should be able to answer about Tivoli Directory Server are as follows:
- Is the server available?
- How busy is the server?
- Is the server about to run out of memory?
- Is the server backlogged?
- How responsive is the server?
- Is the database about to run out of space?
- Are there any errors being logged?
The data that the set of monitors, described in this article, collects for the Tivoli Directory Server answers those questions by reporting the following:
- Server availability
- Server process activity
- Size of the server process
- Workflow queue sizes
- LDAP request response times
- Tablespace sizes and utilization
- Errors logged to the Tivoli Directory Server message log
Each of these metrics is discussed separately below and a monitoring solution is developed to report that metric to the monitoring server.
Tivoli Directory Server monitoring
This section provides details as to how the monitor search results can be analyzed to derive the directory server metrics mentioned in the previous section.
Tivoli Directory Server availability and workload
The easiest way of checking the availability of the Tivoli Directory Server is by sending it a search request. If a monitor search request is used to check the server availability, it can also help us get workload statistics. A base-level search against the base cn=monitor for "objectclass=*" will return the following information:
CN=MONITOR
version=IBM Tivoli Directory (SSL), Version 5.2
totalconnections=18366
total_ssl_connections=0
total_tls_connections=0
currentconnections=51
maxconnections=65516
writewaiters=0
readwaiters=0
opsinitiated=55365
livethreads=1
opscompleted=55365
entriessent=21840
searchesrequested=18681
searchescompleted=18680
bindsrequested=18366
bindscompleted=18366
unbindsrequested=18315
unbindscompleted=18315
addsrequested=0
addscompleted=0
deletesrequested=0
deletescompleted=0
modrdnsrequested=0
modrdnscompleted=0
modifiesrequested=4
modifiescompleted=4
comparesrequested=0
comparescompleted=0
abandonsrequested=0
abandonscompleted=0
extopsrequested=0
extopscompleted=0
unknownopsrequested=0
unknownopscompleted=0
slapderrorlog_messages=21
slapdclierrors_messages=0
auditlog_messages=55365
auditlog_failedop_messages=4
filter_cache_size=25000
filter_cache_current=21
filter_cache_hit=275
filter_cache_miss=103
filter_cache_bypass_limit=100
entry_cache_size=25000
entry_cache_current=1061
entry_cache_hit=2504
entry_cache_miss=1061
acl_cache=TRUE
acl_cache_size=25000
cached_attribute_total_size=0
cached_attribute_configured_size=0
currenttime=2006-12-03 23:21:12 GMT
starttime=2006-11-27 14:41:05 GMT
trace_enabled=FALSE
|
The above output needs to be formatted into a single line of data separated by semicolons for the Universal Agent's consumption. This can be done with a simple awk script:
NR>2{printf "; "}NR>1{n=split($0,attrval,"=");printf attrval[2]}END{print ""}
|
The output of monitor search above can be passed onto the awk script to get the data in the desired format. TDSMon.sh is the script where, the ldapsearch and the awk construct are put together. The contents of the TDSMon.sh script would be:
ldapsearch -h $LDAPSERVER -s base -b cn=monitor "objectclass=*" | awk 'NR>2{printf "; "}NR>1{n=split($0,attrval,"=");printf attrval[2]}END{ print ""}'
|
Now the mdl file needs to be updated to display the attributes gathered out of the monitor search. The associated mdl file (tds.mdl) will look like the following:
LDAP_Monitor Attribute group
| //NAME LDAP_Monitor P 90 AddTimeStamp @ LDAP stats from cn=monitor | | //SOURCE SCRIPT TDSmon.sh envfile=TDS.env Interval=60 | | //INTERNAL OUTPUT LdapMonitor | | //ATTRIBUTES ';' | | APP_Version | D | 50 |
| | TotalConnections | C | 99999999 |
| | TotalSSLConnections | C | 99999999 |
| | TotalTLSConnections | C | 99999999 |
| | CurrentConnections | C | 99999999 |
| | MaxConnections | C | 99999999 |
| | WriteWaiters | C | 99999999 |
| | ReadWaiters | C | 99999999 |
| | OpsInitiated | C | 999999999 |
| | LiveThreads | C | 999999 |
| | OpsCompleted | C | 999999999 |
| | EntriesSent | C | 999999999 |
| | SearchesRequested | C | 999999999 |
| | SearchesCompleted | C | 999999999 |
| | BindsRequested | C | 999999999 |
| | BindsCompleted | C | 999999999 |
| | UnbindsRequested | C | 999999999 |
| | UnbindsCompleted | C | 999999999 |
| | AddsRequested | C | 999999999 |
| | AddsCompleted | C | 999999999 |
| | DeletesRequested | C | 999999999 |
| | DeletesCompleted | C | 999999999 |
| | ModRdnsRequested | C | 999999999 |
| | ModRdnsCompleted | C | 999999999 |
| | ModifiesRequested | C | 999999999 |
| | ModifiesCompleted | C | 999999999 |
| | ComparesRequested | C | 999999999 |
| | ComparesCompleted | C | 999999999 |
| | AbandonsRequested | C | 999999999 |
| | AbandonsCompleted | C | 999999999 |
| | ExtOpsRequested | C | 999999999 |
| | ExtOpsCompleted | C | 999999999 |
| | UnknownOpsRequested | C | 999999999 |
| | UnknownOpsCompleted | C | 999999999 |
| | SlapdErrorLogMessages | C | 999999 |
| | SlapdCliErrorsMessages | C | 999999 |
| | AuditLogMessages | C | 999999 |
| | AuditLogFailedOpMessages | C | 999999 |
| | FilterCacheSize | C | 999999 |
| | FilterCacheCurrent | C | 999999 |
| | FilterCacheHit | C | 999999 |
| | FilterCacheMiss | C | 999999 |
| | FilterCacheBypassLimit | C | 999999 |
| | EntryCacheSize | C | 999999 |
| | EntryCacheCurrent | C | 999999 |
| | EntryCacheHit | C | 999999 |
| | EntryCacheMiss | C | 999999 |
| | AclCache | C | 999999 |
| | AclCacheSize | C | 999999 |
| | CachedAttributeTotalSize | C | 999999 |
| | CachedAttributeConfiguredSize | C | 999999 |
| | CurrentTime | D | 100 |
| | StartTime | D | 100 |
| | TraceEnabled | D | 100 |
| | TraceMessageLevel | D | 100 |
| | TraceMessageLog | D | 100 |
| | EnCurrentRegs | C | 999999 |
| | EnNotificationsSent | C | 999999 |
| | BypassDerefAliases | D | 10 |
| | AvailableWorkers | C | 999999 |
| | CurrentWorkqueueSize | C | 999999 |
| | LargestWorkqueueSize | C | 999999 |
| | IdleConnectionsClosed | C | 999999 |
| | AutoConnectionCleanerRun | C | 999999 |
| | EmergencyThreadRunning | C | 99 |
| | TotaltimesEmergencyThreadRun | C | 99999 |
| | LasttimeEmergencyThreadRun | C | 999999 |
| | ElapsedTimeSeconds | C | 99999999 | Scale{9} Precision{11} |
The contents of TDS.env file, which is used as an environment to the above script, are:
LDAPSERVER=1.2.3.4
LDAPDN=cn=root
LDAPPW=root
|
Where,
LDAPSERVER should point to the hostname or the IP of the directory server to be monitored.
LDAPDN should match the bind DN.
LDAPPW should match the bind PW.
It's assumed that the directory server runs on the port 389.
Tivoli Directory Server workload rates
A lot of the data returned by cn=monitor is in the form of counters, for example, OpsInitiated and OpsCompleted. By doing some simple arithmetic, useful results can be derived, such as current operations in progress, which are OpsInitiated - OpsCompleted. The Universal Agent can do this arithmetic for us when derived variables are declared in the metafile. For the example of the operations in progress, the OpsOutstanding counter is declared like this:
OpsOutstanding (OpsInitiated - OpsCompleted)
|
One can also get rates, such as OpsPerSecond, using the "?" type declaration, like this:
OpsPerSecond ? (OpsInitiated)
|
Earlier the data from the LDAP_Monitor attribute group was saved to an internal buffer named LdapMonitor. Here an attribute group is declared that uses that buffer as a source of data.
//NAME LDAPRates P 90 AddTimeStamp @ LDAP stats with additional calculations
//INTERNAL INPUT LdapMonitor
//ATTRIBUTES ';'
|
In the LDAPRates attribute group, the attributes that aren't useful for calculations are skipped, for example, APP_Version. The attributes to be skipped are prefixed with "-". Here is the entire attribute group:
LDAPRates attribute group
| //NAME LDAPRates P 90 AddTimeStamp @ LDAP stats with additional calculations | | //INTERNAL INPUT LdapMonitor | | //ATTRIBUTES ';' | | -APP_Version | D | 50 | | TotalConnections | ? | 999999 | | -TotalSSLConnections | C | 999999 | | -TotalTLSConnections | C | 999999 | | CurrentConnections | ? | 999999 | | -MaxConnections | C | 999999 | | -WriteWaiters | C | 999999 | | -ReadWaiters | C | 999999 | | OpsInitiated | C | 9999999 | | LiveThreads | C | 999999 | | OpsCompleted | C | 9999999 | | OpsPerSecond | ? | (OpsInitiated) | | OpsOutstanding |
| (OpsInitiated - OpsCompleted) | | EntriesSent | ? | 9999999 | | SearchesRequested | C | 9999999 | | SearchesCompleted | C | 9999999 | | SearchesPerSecond | ? | (SearchesRequested) | | SearchesOutstanding |
| (SearchesRequested - SearchesCompleted) | | BindsRequested | C | 9999999 | | BindsCompleted | C | 9999999 | | BindsPerSecond | ? | (BindsRequested) | | BindsOutstanding |
| (BindsRequested - BindsCompleted) | | UnbindsRequested | C | 9999999 | | UnbindsCompleted | C | 9999999 | | UnbindsPerSecond | ? | (UnbindsRequested) | | UnbindsOutstanding |
| (UnbindsRequested - UnbindsCompleted) | | AddsRequested | C | 9999999 | | AddsCompleted | C | 9999999 | | AddsPerSecond | ? | (AddsRequested) | | AddsOutstanding |
| (AddsRequested - AddsCompleted) | | DeletesRequested | C | 9999999 | | DeletesCompleted | C | 9999999 | | DeletesPerSecond | ? | (DeletesRequested) | | DeletesOutstanding |
| (DeletesRequested - DeletesCompleted) | | ModRdnsRequested | C | 9999999 | | ModRdnsCompleted | C | 9999999 | | ModRdnsPerSecond | ? | (ModRdnsRequested) | | ModRdnsOutstanding |
| (ModRdnsRequested - ModRdnsCompleted) | | ModifiesRequested | C | 9999999 | | ModifiesCompleted | C | 9999999 | | ModifiesPerSecond | ? | (ModifiesRequested) | | ModifiesOutstanding |
| (ModifiesRequested - ModifiesCompleted) | | ComparesRequested | C | 9999999 | | ComparesCompleted | C | 9999999 | | ComparesPerSecond | ? | (ComparesRequested) | | ComparesOutstanding |
| (ComparesRequested - ComparesCompleted) | | AbandonsRequested | C | 9999999 | | AbandonsCompleted | C | 9999999 | | AbandonsPerSecond | ? | (AbandonsRequested) | | AbandonsOutstanding |
| (AbandonsRequested - AbandonsCompleted) | | ExtOpsRequested | C | 9999999 | | ExtOpsCompleted | C | 9999999 | | ExtOpsPerSecond | ? | (ExtOpsRequested) | | ExtOpsOutstanding |
| (ExtOpsRequested - ExtOpsCompleted) | | UnknownOpsRequested | C | 9999999 | | UnknownOpsCompleted | C | 9999999 | | UnknownOpsPerSecond | ? | (UnknownOpsRequested) | | UnknownOpsOutstanding |
| (UnknownOpsRequested - UnknownOpsCompleted) | | -SlapdErrorLogMessages | C | 999999 | | -SlapdCliErrorsMessages | C | 999999 | | -AuditLogMessages | C | 999999 | | -AuditLogFailedOpMessages | C | 999999 | | -FilterCacheSize | C | 999999 | | -FilterCacheCurrent | C | 999999 | | FilterCacheUsage |
| (FilterCacheCurrent % FilterCacheSize) | | -FilterCacheHit | C | 999999 | | -FilterCacheMiss | C | 999999 | | -FilterCacheAttempts |
| (FilterCacheHit + FilterCacheMiss) | | FilterCacheHitRatio |
| (FilterCacheHit % FilterCacheAttempts) | | -FilterCacheBypassLimit | C | 999999 | | -EntryCacheSize | C | 999999 | | -EntryCacheCurrent | C | 999999 | | EntryCacheUsage |
| (EntryCacheCurrent % EntryCacheSize) | | -EntryCacheHit | C | 999999 | | -EntryCacheMiss | C | 999999 | | -EntryCacheAttempts |
| (EntryCacheHit + EntryCacheMiss) | | EntryCacheHitRatio |
| (EntryCacheHit % EntryCacheAttempts) | | -AclCache | C | 999999 | | -AclCacheSize | C | 999999 | | AclCacheUsage |
| (AclCache % AclCacheSize) | | -CachedAttributeTotalSize | C | 999999 | | -CachedAttributeConfiguredSize | C | 999999 | | CachedAttributeCacheUsage |
| (CachedAttributeTotalSize % CachedAttributeConfiguredSize) | | -CurrentTime | D | 100 | | -StartTime | D | 100 | | -TraceEnabled | D | 100 | | -TraceMessageLevel | D | 100 | | -TraceMessageLog | D | 100 | | -EnCurrentRegs | C | 999999 | | -EnNotificationsSent | C | 999999 | | -BypassDerefAliases | D | 10 | | -AvailableWorkers | C | 999999 | | -CurrentWorkqueueSize | C | 999999 | | -LargestWorkqueueSize | C | 999999 | | Workload |
| (CurrentWorkqueueSize % AvailableWorkers) | | -IdleConnectionsClosed | C | 999999 | | -AutoConnectionCleanerRun | C | 999999 | | -EmergencyThreadRunning | C | 99 | | -TotaltimesEmergencyThreadRun | C | 99999 | | -LasttimeEmergencyThreadRun | C | 999999 | | -ElapsedTimeSeconds | C | 99999999 Scale{9} Precision{11} |
Note here we’ve used "%" to divide and multiply by 100 (very useful for calculating percentages). Also note that attributes like FilterCacheAttempts attribute are an example of intermediate derived values, used to provide input to the calculation of attributes like FilterCacheHitRatio and not actually displayed.
Viewing Tivoli monitoring data
This section provides screenshots of the graphs and reports that the portal provides, in response to the metafile and scripts shown in the earlier section. There are a couple of ways for viewing the graphs and reports:
-
Launch the GUI for managing the Tivoli Enterprise Monitoring Services using the command:
$CANDLEHOME/bin/itmcmd manage
|
In the above GUI, right-click on the Tivoli Enterprise Portal Desktop Client and click Start.
-
Launch the Tivoli Enterprise Portal using the link:
http://host:1920///cnp/kdb/lib/cnp.html
|
Where,
host is the name of the system where the Tivoli Enterprise Portal Server is running.
When the Tivoli Enterprise Portal is launched, in the navigator on the left, you will see the hostname of the monitoring server listed under the node of the corresponding operating system. Under the Universal Agent select the application host:TDSxx to list down the attribute groups you have in your mdl file.
Here is a screenshot of the navigator showing a Universal Agent on a Linux system.
Figure 3: Universal Agent navigator
The attribute groups describing the monitor statistics are the LDAP_Monitor and LDAPRates. As explained earlier, the LDAPRates attribute group contains a set of derived fields for each type of operation showing the number of operations currently being worked on by the server. A bar chart can be plotted with all the "XXX_Outstanding" counters, replacing the default Table view at the top of the screen.
- Open the report for the LDAPRates attribute group by clicking on the same in the navigator on the left.
- Click the Bar Chart tool icon
and click on the Table view which contains the data to be plotted.
- A dialog box will pop-up providing you the option of selecting the attributes that can be plotted in the bar chart.
- Select all of the "...Outstanding" items (use the Ctrl key to make multiple selections) and click OK.
The bar chart will now display in the top view, using default settings:
Figure 4: Outstanding operations
To change the label for the chart above:
- Right-click on the new chart and choose Properties....
- Click on the Style tab and change the text field from Bar Chart to Current Workload.
- Click on the picture of the legend on the right and click on the Legend Label tab and you can take Oustanding out of all the labels.
- Click on OK and you will see the updated chart.
Here's the chart with custom labels:
Figure 5: Directory Server workload
To save all this work, click on any another attribute group, and answer Yes when asked if you want to save your changes to the LDAPRates attribute group.
The LDAPRates attribute group also has several attributes related to the cache utilizations and hit ratios. Figure 1 provides the report on cache utilizations and hit ratios.
Tivoli Directory Server audit log
This section details the procedure on analyzing the audit log to calculate the response times of operations.
Enabling the Tivoli Directory Server audit log
In order to get Tivoli Directory Server response times, the audit log needs to be enabled. This can be done either via the Tivoli Directory Server Web Administration GUI or via an ldapmodify command. See the
Tivoli Directory Server Administration Guide
for information on enabling the audit log via the GUI. You can easily turn on and off auditing for different operations via an ldapmodify operation against the cn=audit,cn=localhost object. You can also query the current state of auditing via an ldapsearch against the cn=audit,cn=localhost object. For example, on Tivoli Directory Server v5.2:
# ldapsearch -Dcn=root -wtivoli -s base -b cn=audit,cn=localhost "objectclass=*"
CN=AUDIT,CN=LOCALHOST
objectclass=ibm-auditConfig
objectclass=ibm-slapdConfigEntry
objectclass=top
cn=audit
ibm-auditLog=/var/ldap/audit.log
ibm-auditVersion=2
ibm-auditBind=true
ibm-auditUnbind=true
ibm-audit=false
ibm-auditfailedoponly=false
ibm-auditsearch=false
ibm-auditadd=false
ibm-auditmodify=false
ibm-auditdelete=false
ibm-auditmodifydn=false
ibm-auditextopevent=false
ibm-auditextop=false
|
For TDS v6.0 the audit log information is stored under cn=Audit,cn=Log Management,cn=Configuration. Hence, if you want to enable or disable audit log for TDS v6.0, use that as the base. For the sake of this example, TDS v5.2 is being assumed.
By default auditing is off (ibm-audit=false). You'll want to set ibm-audit=true as well as ibm-auditFailedOPonly=false and ibm-auditOpXXX=true for each operation you are interested in monitoring. Create the following LDIF file (ldapaudit.on.52) to be used in enabling the audit log:
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-audit
ibm-audit: true
dn: cn=audit,cn=localhost
changetype: modify
replace: auditFailedOPonly
ibm-auditFailedOPonly: false
dn: cn=audit,cn=localhost
changetype: modify
replace: auditSearch
ibm-auditSearch: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditAdd
ibm-auditAdd: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditModify
ibm-auditModify: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditDelete
ibm-auditDelete: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditModifyDN
ibm-auditModifyDN: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditExtOPEvent
ibm-auditExtOPEvent: true
dn: cn=audit,cn=localhost
changetype: modify
replace: ibm-auditExtOp
ibm-auditExtOp: true
|
This turns on auditing for every operation. The modification operation is passed to the Tivoli Directory Server server using ldapmodify:
# ldapmodify -Dcn=root -wpassword -f ldapaudit.52.on
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
modifying entry cn=audit,cn=localhost
|
Audit log with calculated response times
When the Tivoli Directory Server audit log is enabled, the server will write information to a file (/var/ldap/audit.log file) every time it sends the response to a request. Here is a snippet of an audit log for a bind followed by a search.
AuditV2--2006-08-08-12:02:25.797-06:00DST--V3 Bind--bindDN: cn=root--client: 9.48.171.95:62516--connectionID: 1642--received: 2006-08-08-12:02:25.797-06:00DST--Success
name: cn=root
authenticationChoice: simple
AuditV2--2006-08-08-12:02:32.881-06:00DST--V3 Search--bindDN: cn=root--client: 9.48.171.95:60905--connectionID: 155--received: 2006-08-08-12:02:32.875-06:00DST--Success
base: eruid=ITIM Manager,ou=systemUser,ou=itim,ou=acme,dc=com
scope: baseObject
derefAliases: neverDerefAliases
typesOnly: false
filter: (objectclass=*)
|
The first line, starting with “AuditV2”, is the same for every operation.
After “AuditV2—“ is the timestamp of the reply, followed by "--", then the operation, then "--bindDN:" and the DN that the client was bound as, followed by "--" again.
The client IP address and port is delimited by the "client:" strings and "--", the connection ID is delimited by "connectionID:" and "--", then after "received: " is the timestamp that the request was received, then "--" and finally the status.
The built-in capability of the Universal Agent to read formatted log files can be used to parse the audit log. The attribute group to parse the audit log can be defined as follows:
Attribute group to parse audit log
| //NAME AuditLog E @ LDAP audit log | | //SOURCE FILE '/var/ldap/audit.log' tail | | //RECORDSET 10 NEW(0,==,AuditV2) | | //ATTRIBUTES | | Timestamp | D | 33 | DLMSTRBGN='AuditV2--' | DLMSTREND='--' | +FILTER={MATCH(0,200)} | | Operation | D | 20 |
| DLMSTREND='--' |
| | BindDN | D | 20 | DLMSTRBGN='bindDN: ' | DLMSTREND='-- |
| | Client | D | 24 | DLMSTRBGN='client: ' | DLMSTREND='--' |
| | ConnectionID | D | 12 | DLMSTRBGN='connectionID: ' | DLMSTREND='--' |
| | ReceivedTime | D | 33 | DLMSTRBGN='received: ' | DLMSTREND='--' |
| | Status | D | 10 |
|
|
|
After giving the filename, the tail keyword tells the Universal Agent to watch for new records being appended to the end of the log file. Each record consists of up to 10 lines starting with "AuditV2", which is conveyed to the Universal Agent using the "//RECORDSET" keyword. The DLMSTRBGN keyword gives the string immediately preceding the desired data and the DLMSTREND keyword gives the string that follows. The lines of interest are the ones that begin with "AuditV2--" followed by a timestamp (which begins with "200") so the "+FILTER={MATCH(0,200)}" is used to say that the timestamp must begin with "200. If not, the line is ignored.
All of the fields are strings so the type of "D" is used.
The objective of parsing the audit log is to know how long the server is taking to respond to client requests. This can be calculated by taking the difference between the two timestamps in each audit record. This is a little more complicated arithmetic than the Universal Agent is capable of, so a custom script needs to be written to do the calculations.
Let's say that ldapaudit.awk is the awk script to do this task and the result of running the ldapaudit.awk script against the bind and search examples are:
2006-08-08-12:02:25.797 2006-08-08-12:02:25.797 0 ms Bind cn=root 9.48.171.95:62516 1642 Success
2006-08-08-12:02:32.875 2006-08-08-12:02:32.881 6 ms Search cn=root 9.48.171.95:60905 155 Success base: eruid=ITIM Manager,ou=systemUser,ou=itim,ou=acme,dc=com scope: baseObject filter: (objectclass=*)
|
The data needs to be passed to the socket data provider, which is listening on port 7500. There are several different ways of associating the passed data with the correct attribute definition, but the most flexible way is to pass the name of the metafile when the connection is first made, then include the name of the application and attribute group on each line of the passed data, in the following form:
<ApplName=TDS><AttrGroup=Audit_Times>
|
The following perl script (feedsock.pl) will take care of this for us, when invoked as "perl feedsock.pl -m tds.mdl -n TDS -g Audit_Times"
#!/usr/bin/perl -w
# SockFeed.pl
# a simple UA client using IO:Socket
# takes stdin and feeds data to the UA
#----------------
use strict;
use IO::Socket;
use Getopt::Std;
our $opt_h; # h = host
our $opt_p; # p = port
our $opt_m; # m = metafile
our $opt_n; # n = application name
our $opt_g; # g = attribute group
getopts('h:p:m:n:g:');
my $host = $opt_h ? $opt_h : 'localhost';
my $port = $opt_p ? $opt_p : 7500;
my $metafile = $opt_m ? $opt_m : 'Sock';
my $applname = $opt_n ? $opt_n : 'SockEvent';
my $attrgroup = $opt_g ? $opt_g : 'Events';
my $prefix = "<ApplName=$applname><AttrGroup=$attrgroup>";
print "Prefix is $prefix\n";
print "Connecting to $port on $host\n";
# Initialize socket connection to UA
#----------------
my $line;
my $sock = new IO::Socket::INET( PeerAddr => $host, PeerPort => $port, Proto =>
'tcp');
$sock or die "no socket :$!";
# Send explicit specification of metafile
#----------------
syswrite $sock, "//$metafile\r";
while ($line=<>) {
syswrite $sock, "$prefix$line\r";
}
# Finalization Processing
#---------------
syswrite $sock, "//END-DP-INPUT\n";
close $sock;
|
As evident from the code snippet, the Universal Agent will accept the above data in the attribute group Audit_Times, as defined below:
Audit_Times attribute group
| //NAME Audit_Times E @ LDAP audit log with response times | | //SOURCE SOCK localhost | | //ATTRIBUTES ' ' | | StartTime | D | 33 | | StopTime | D | 33 | | ResponseTime | C | 99999 | | -msTag | D | 2 | | Operation | D | 12 | | BindDN | D | 40 | | ClientIP | D | 24 | | ConnID | D | 12 | | Status | D | 12 | | Parameters | Z | 1024 |
The above attribute group signifies that the socket data provider expects data in the form of values separated by spaces. The "ms" label after the response time is ignored. The "Z" type for the "Parameters" attribute says to assign the remainder of the line following the status value to it. In other words, the search parameters and the name of the entry being modified or deleted are collected in the attribute named Parameters.
A bar chart of response times can be plotted using the AUDIT_TIMES attribute group's workspace to discern high response times if any.
Figure 6: AUDIT_TIMES attribute group
Setting up situations
A “situation” is a mechanism whereby the tool can flag conditions that need to be brought to the attention of administrators, for example, if response times are starting to get too high. When a situation fires, a warning icon reflecting the severity of the situation will be displayed next to the attribute group's name in the navigator. Situations are rolled up in the hierarchy of the navigator. Higher levels in the navigator will display the icon of the most severe situation, so one can immediately drill down to the worst problems first.
The
IBM Tivoli Monitoring 6.1 User's Guide's
information on “situations for event-based monitoring” describes how to set up situations. This article talks about setting up a situation for the Tivoli Directory Server response times.
Start by right-clicking on the AUDIT_TIMES attribute group in the navigator and select Situations... from the popup menu to open the Situation editor. Right-click AUDIT_TIMES and select Create New.... Fill in the fields for the name and description of the situation.
Figure 7: Situations for AUDIT_TIMES
Click OK and then select the Response Time field from the Attribute Item list in the "Select condition" dialog.
Figure 8: Condition for situation
Click OK and the Situation editor will now let you specify a formula. Click on the cell immediately below the name heading and then click on the == and select > Greater than from the menu
Figure 9: Formula for the situation
Now enter the value at which you want the situation to be triggered, for example 10000 (10 seconds). Click on the next cell down and click on the formula box (with the "v") so you can choose the Average value formula.
Figure 10: Enter value for triggering situation
Fill in 1000 (1 second) for the average value to trigger from. Now click on the Expert Advice tab and you can add helpful suggestions.
Figure 11: Expert advice
Now click OK. The default sampling interval is 15 minutes so you'll need to wait a little while before the situation fires. Then the navigator will show the critical icon at each level from AUDIT_TIMES up to the Enterprise.
Figure 12: Critical icon
The Enterprise-level workspace will also display current situations when you first log in to the Portal.
Figure 13: Enterprise-level workspace
If you right-click on the entry in the Situation Event Console, you can select Situation Event Results...
Figure 14: Situation event console
Here you can see the values that caused the situation along with the help page that you created.
Figure 15: Help page
You can create a similar situation for the cache hit ratios, as well as process memory utilization. If you see that your hit ratio is low while the cache is full, then it would potentially be useful to increase the size of the cache. If, however, the process size is getting close to any limit you've set, then you might not want to increase the cache size, and might even want to consider reducing the cache size to reduce the process size.
More tuning information is in the IBM Tivoli Directory Server Performance Tuning Guide, in the Resources section.
For information on more active notification of situations, such as sending a text message, see the IBM Tivoli Monitoring 6.1 User's Guide (search for "reflex automation").
Prototypes
A directory server monitoring solution has been developed which will provide better insights on the above mentioned scripts and metafile. A white paper has been written that describes the uploaded solution. The complete package containing the latest version of the white paper along with the available scripts and metafiles is available on the Open Process Automation Library (OPAL). Search for "directory server using universal agent".
Conclusion
This document provides the steps to set up a solution to monitor a Tivoli Directory Server environment, based upon best practices observed from successful customer deployments. Additionally, the instructions provide enough background to develop new views or to customize the offering to suit individual needs.
Resources
About the authors  | 
|  | Dave Bachmann is the performance lead for Tivoli Security Products. Based in Austin, Texas in the United States, he has been working on the performance of various distributed systems since coming to IBM in 1992. He has helped with the performance of IBM's Distributed Computing Environment (DCE), IBM Tivoli Directory Server (LDAP), IBM Tivoli Access Manager, IBM Public Key Infrastructure, IBM Risk Manager, IBM Tivoli Identity Manager, IBM Tivoli Federated Identity Manager, IBM Tivoli Directory Integrator and IBM Tivoli Security Operations Manager products. He received a B.S. at Iowa State, an M.S from The University of Michigan, and a Ph.D. from The University of Michigan, where he worked on performance modeling of distributed systems. |
 | 
|  | Ramakrishna J Gorthi is a developer for the IBM Tivoli Directory Server, Pune center in India. He has six years of experience in the IT industry, all in IBM, with one year of experience in Level 2 Customer Support for the various versions of the IBM Tivoli Directory Server and the rest of the experience in IBM Tivoli Directory Server development and testing. He has authored the TDS IBM Redbook® titled Understanding LDAP. He has written developerWorks articles pertaining migration and distributed directory scenarios. He holds a degree in Computer Engineering from Pune Institute of Computer Technology, Pune (India). His areas of expertise include IBM Tivoli Directory Server from the Tivoli Security Products and DB2®. |
 | 
|  | Amit Bhate is the development lead for IBM Tivoli Directory Server, Pune center in India. He has a Computer Engineering degree from University of Pune (India). He has over seven years experience in IBM and worked on various IBM products. He spent three years in the Level 3 Support team for the IBM Distributed File System product. He was involved in the development activities of IBM DFS 3.0 Client on Windows, Lotus Notes 7.0 and IBM Tivoli Directory Server 6.1. |
Rate this page
|