Optim Performance Manager helps optimize the performance and availability of business-critical DB2 databases and applications. This solution delivers a proactive, comprehensive performance management approach that allows organizations to identify, diagnose, solve, and prevent performance problems.
Optim Performance Manager Extended Edition extends the capability of Optim Performance Manager with end-to-end database monitoring for Java™ technology and DB2 Call Level Interface (CLI) applications, with out-of-the-box configurations for SAP, IBM WebSphere®, IBM Cognos®, IBM InfoSphere® DataStage, and InfoSphere SQL Warehouse applications. The Extended Edition also includes integration with IBM Tivoli® monitoring solutions to provide deep database insights into existing Tivoli ITCAM application-monitoring environments. For the purposes of this article, we use the name Optim Performance Manager to refer to both editions of the product. For more information about Optim Performance Manager, see the article "What's new in Optim Performance Manager Extended Edition for DB2 for Linux, UNIX, and Windows" (developerWorks, April 2010).
Although Tivoli ITCAM for Transactions integration is built into the Extended Edition, it may also be desirable to send performance alerts to other monitoring solutions. This article describes how you can extend Optim Performance Manager by using a user exit to integrate DB2 database monitoring with other enterprise monitoring systems such as Tivoli Netcool/OMNIbus. We use the Simple Network Management Protocol (SNMP) to communicate database alert information. SNMP is a popular standard protocol that allows system management software to request and receive information from network devices, servers, and software.
Although we use the user exit in this article to send SNMP traps, the Optim Performance Manager User Exit is a simple, general-purpose mechanism that can be configured to call any executable program, making it possible for you to extend your event-based database monitoring in a wide variety of ways.
Why integrate Using Optim Performance Manager with SNMP?
The overview and diagnostic dashboards provided by Optim Performance Manager are a powerful way to present a lot of information about DB2 system health in a single screen. Highlighting quickly draws your attention to system alerts or thresholds that have been exceeded and need attention. DBAs make frequent use of these dashboards to get an overview of system status and to quickly identify and resolve potential problems as they go about the business of keeping the database healthy and happy, making sure that service levels are met and the business keeps running.
In the majority of modern enterprises, IT assets (both hardware and software), including network devices, servers, system software, and applications, are critical to keeping the business running. Many organizations use centralized system management tools and processes, coupled to ticketing systems and support processes to manage the infrastructure 24/7. SNMP Managers are usually at the center of such systems. An SNMP Manager can poll SNMP Agents residing on critical hardware and software for key information, so that you know the business infrastructure is alive and well. Network interfaces, routers and bridges, server computers, rack systems, and blades may be regularly polled to return status information in response to SNMP Get requests. Key middleware, including DB2 and software applications, can also participate in this process. In critical situations, SNMP Agents can immediately send information to the SNMP Manager using a message known as a trap.
We can use Optim Performance Manager's User Exit facility to call our sample program whenever a critical event occurs or whenever an Optim Performance Manager alert threshold is exceeded. This gives us an extensive and robust DB2 monitoring solution that is integrated with enterprise system-monitoring solutions. Our code will gather information about the alerts and send an SNMP trap to an SNMP Manager. If you configure suitable warning and error thresholds for key DB2 metrics in Optim Performance Manager, your SNMP Manager will be notified whenever a threshold is exceeded, so you can schedule preventative measures before the problems escalate or take immediate action in an emergency.
About alerts, events, and threshold exceptions
Optim Performance Manager defines two types of alerts or exceptions for DB2 systems:
- Event alerts - system-defined alerts that are triggered when deadlocks are detected.
- Periodic alert processing or threshold alerts - user-defined alerts that are based on thresholds that you define for many of the operating system and DB2 counters, metrics, and ratios.
Optim Performance Manager provides a sophisticated user-friendly, web-based user interface that is used to access information about the DB2 systems being monitored by the Optim Performance Manager server. This article focuses on alerts and the exception-processing features of the system. The Optim Performance Manager Health Overview (illustrated in Figure 1) gives you a summary view of all your monitored databases, including alerts (either red or amber, depending on whether a warning or error threshold has been exceeded, or green, if values are in range).
Figure 1. IBM Optim Performance Manager Health Overview
When you see a red or amber problem indicator on the health overview, you can quickly get more information about that system from the Overview Dashboard, as shown in Figure 2:
Figure 2. IBM Optim Performance Manager Overview Dashboard
(View a larger version of Figure 2.)
From the overview, you can use the Task Manager drop-down menu (as shown in Figure 3) to select Alerts to display details of both periodic and event alerts or exceptions that were processed by the server.
Figure 3. The Task Manager
An example of the alert dashboard is shown in Figure 4.
Figure 4. Optim Performance Manager Alerts Dashboard
(View a larger version of Figure 4.)
When you select one of the alerts, Optim Performance Manager displays more detailed information about the condition on the same screen. Along with the details, Optim Performance Manager provides links to help you quickly navigate to one of the more detailed diagnostic dashboards for the problem under investigation. An example is shown in Figure 5. From the Health Summary screen, you can also click on any warning or error indicators and immediately see an Alert Summary. The powerful combination of linked overview, summary, and detailed information helps you quickly identify potential problems and drill into the details to understand what actions may be required to fix the problem.
Figure 5. Drill down into one alert
In addition to viewing the Overview Dashboard and Health Summary, Optim Performance Manager offers two additional ways for you to be notified about alerts as they occur. The simplest way is to use the built-in notification service, which provides email alerts. The most flexible way is a user exit, like the one described in this article.
The notification service capability is built into Optim Performance Manager. It uses Simple Mail Transfer Protocol (SMTP), a standard protocol for the transfer of email. It is configured from the Services entry of the Task Manager, as shown in Figure 6.
Figure 6. Use Services to configure email alerts
When you select Services, you are taken to a list of services. Select Notification Service, and then click on the Configure button, and you will see the configuration dialog shown in Figure 7. You will need to supply host and port information for your mail server, sender and recipient email addresses, and, in most cases, server authentication information. Depending on your email infrastructure, this function may allow you to receive notifications in your email client, on mobile devices, or using a web browser.
Figure 7. Configuring the built-in email notification service
Delivering alerts with a user exit for SNMP
When you need an active alerting mechanism in addition to email, or you need to carry out some custom processing in response to an alert, Optim Performance Manager offers a user exit that can be configured to call any executable program when an alert occurs. The called program is passed information about the alert as an XML document. The program can be implemented in any language to accomplish whatever task you need. You could automatically open a problem ticket or add the problem to a "to-do" list. You could even parse and understand the alert and take some automated remedial actions.
For this article, we developed a sample user exit program (see the Download section) to convert incoming alerts into SNMP traps. This allows us to integrate DB2 monitoring with Tivoli Netcool enterprise system management software or other SNMP managers. See the architecture diagram in Figure 8.
Figure 8. Optim Performance Manager architecture with user exit and SNMP Manager
When an alert occurs, Optim Performance Manager runs the program specified as the user exit, passing information about the alert to the program through standard input in the form of an XML document. Our program takes the alert information and uses it to create SNMP traps that are broadcast to the network management infrastructure. The program carries out the following main tasks:
- Parse the incoming XML document into a DOM tree
- Access the DOM tree to extract alert information that will be sent to the SNMP Managers
- Call the appropriate SNMP APIs to construct an SNMP v2c trap message and send it to the SNMP Managers
The program is made up of the following Java technology classes and supporting files:
- Class SNMPExit
- Class PEXMLDom
- Class ThresholdException
- Classes ThresholdValues and ThresholdInfo
- Class InstanceInfo
- Class EventException
- Class AgentInfo
- Class Exception
- Class Trace
- File SNMPExit.bat
- File SNMPExit.xml
- File IBM-Db2PerfExpertNotifications.mib
- Files PEThresholdXML and PEEvent.xml
- Files snmp4j-1.11.jar and log4j-1.2.14.jar
This class contains our
main() method. If you
specify any command line parameters, tracing will be turned on and trace
messages will be written to a temporary file in the current directory.
Optim Performance Manager invokes the user exit, passing details of the
alert on the stdin stream, so we create a PEXMLDom object passing
System.in. This calls DOM to parse the XML into a memory based tree of XML
We can process both types of exceptions, the threshold exceptions that are
user-defined and the event exceptions for deadlocks. Each type of alert
has a different XML document associated with it. The documents are
different enough that the program constructs either a ThresholdException
object or an EventException object and then calls that object's
This class contains a constructor and a number of convenience methods that help retrieve information from the DOM tree. The incoming XML files contain a great deal of detailed information about the alerts. Our program only sends a small subset of that information to the SNMP Manager, so many of the methods in this class are designed to help retrieve elements or sets of elements from the DOM tree.
Example XML documents are shipped with Optim Performance Manager and as
part of the download accompanying this article.
You can examine those documents to see all the supplied information. The
XML document for a threshold alert is shown in Listing
1. You will notice that in many cases, the XML elements have
generic name tags such as "Metric" and the real identifier is carried on
name attribute of the generic element. This
makes parsing the document a little more complex than it might be. For
example, to get the value of a specific metric element, such as the
current value of the monitored metric, we have to check each
Metric element until we find the one with the
right name attribute. An alternate implementation of this program might be
more efficient if it used the SAX parsing method, rather than DOM. But for
overall simplicity and clarity, we decided DOM would be easier to
Listing 1. Threshold alert XML
<PEException timestamp="2005-12-14 10:08:33.68047" type="threshold"> <product_info platform="Windows 2000" product="DB2 Performance Expert Server" version="18.104.22.168.379" /> <monitored_instance db2_version="V8R2FP3" host_name="bergamotte" instance_alias="tst8s1" instance_name="tst8s1" node_name="NODE0009" operating_system="AIX" port_number="64610" /> <threshold_exception category="statistics" exception_field="deadlocks" subcategory="databases"> <datagroup name="pe_excplog"> <metric name="pel_startts">2005-12-14 10:06:32.226035</metric> <metric name="pel_pets_global">N</metric> <metric name="pel_maxmints">2005-12-14 10:08:33.68047</metric> <metric name="pel_id">5</metric> <metric name="pel_currentvalue">3</metric> <metric name="pel_petd_warningvalue">0</metric> <metric name="pel_petd_errorvalue">1</metric> <metric name="pel_owner">USER1</metric> <metric name="pel_currentts">2005-12-14 10:08:33.68047</metric> <metric name="pel_maxminvalue">3</metric> <metric name="pel_startvalue">1</metric> <metric name="pel_pets_id">1</metric> <metric name="pel_petd_countername">DBSE423</metric> <metric name="pel_level">STATISTICS</metric> <datagroup name="excplog_detail"> <metric name="db_name">SAMPLE</metric> <metric name="db_path"> /pedev/home/tst8s1/tst8s1/NODE0000/SQL00001/ </metric> <metric name="total_cons">9</metric> <metric name="db_status_st"> Database is active </metric> <metric name="db_conn_time">2005-12-14 10:00:02.000341</metric> <metric name="member">PART0</metric> <metric name="member_id">0</metric> </datagroup> </datagroup>
This class uses the helper methods on the PEXMLDom object to retrieve the
values we will send in the SNMP trap message. An SNMP message is
essentially a set of name/value value pairs. The names are identifiers
expressed as Object IDentifiers, or OIDs. OIDs are unique and are
expressed as strings of digits; for example the OID for our trap reporting
a threshold exception is 22.214.171.124.126.96.36.199.206.0.3, which translates to:
These OIDs and the data types of the associated values are defined in a document known as a Management Information Base, or MIB. You can find documentation about SNMP, OIDs, and MIBs on the Internet. A good way to understand the MIB is to use one of the available MIB browsers to explore the MIB shipped in this package. See Figure 9 for an example of our MIB in KS_Soft's MIB Browser, which is available from KS-Soft - Network Management Solutions (see Resources).
The constructor for this object pulls together information from the incoming XML document so that the name value pairs specified in the MIB can be populated. These include: product version, trap time stamp, name of the host system, information about the DB2 instance where the alert occurred, the category and severity of the problem, and the current values and threshold values for warning and errors. Some of this information is also built into a message string.
Figure 9. The Management Information Base (MIB) for our SNMP traps
The second method in this class is
method takes the information we put together in the constructor and binds
each variable into an SNMP trap message using SNMP name/ value pairs. Each
pair consists of the OID and the associated value. These are wrapped by
SNMP objects known as VarBinds, and each of these VarBinds is added to the
SNMP message object, which is known as a Protocol Data Unit or PDU. When
the PDU is complete, we send it. You will notice that both this class and
the EventException class extend the base class Exception.
Classes ThresholdValues and ThresholdInfo
These classes encapsulate information, values, and thresholds associated with a threshold alert. In particular, the ThresholdValues class examines the current value and the thresholds to define the severity of the alert. Methods on PEXMLDom are used to populate these objects since the required values for each object come from one set of peer elements in the XML document.
This class encapsulates instance variables common to both types of alerts
for the same reason as described above for classes ThresholdValues and ThresholdInfo. In these cases, we optimized
processing the XML by processing all the attributes of an element at one
time. For example, the InstanceInfo class is populated based on the
attributes of the
We decided it was more efficient to retrieve all the attributes together
in this way.
This class is very similar to ThresholdException. The
constructor extracts information from the PEXMLDom object, and the
send() method creates the appropriate VarBinds
and then sends the SNMP trap. Much of the information collected is
similar, but instead of dealing with a DB2 metric that is out of bounds,
in this case we are dealing with a deadlock. Therefore, the important
information is about the agents and applications involved in the deadlock,
in particular the agent and application that were rolled back.
This class encapsulates information about each agent involved in a deadlock alert. One of these objects will represent the agent that was rolled back. This instance will include additional information about schema and table that was involved in the contention.
This is the base class that ThresholdException and EventException both extend. We implemented this class to provide some common methods and to hide the details of the SNMP interface. There are some helper methods to simplify SNMP variable binding and a method to acquire a new PDU. We also set up some initial VarBinds that are required by the SNMP v2c standard to be the first variables bound to the PDU.
One of these variables is
should indicate how long the DB2 system has been up. Unfortunately, we
don't have access to this information, so we hard-coded a dummy value into
this required field. Both this method and the
send() method retrieve properties from a
properties file called SNMPExit.XML that must be
present in the current directory. .It is used to specify details required
to complete the PDU and to identify the target SNMP Manager that the trap
will be sent to.
send() creates a default transport object
and an SNMP object, specifies information about the target SNMP Manager,
and calls the
send() method on the SNMP
This class provides a very simple trace or logging mechanism that writes trace messages to a file in the current directory. The file name is of the form PESNMPxxxxx.trc, where xxxxx is generated at run time. Each invocation of the user exit will generate a new trace file, so please don't leave trace on in production. Trace is turned on by specifying any command line parameter to the program when it is invoked.
This is a batch file that OPM will invoke. By invoking a batch file, we can set up a suitable runtime environment for our program. And if we need to, we can switch on tracing by adding a parameter to the invocation. The default is:
"%java_dir%"\bin\java.exe -jar SNMPExit.jar
We can switch trace on using this form:
"%java_dir%"\bin\java.exe -jar SNMPExit.jar TRACE
This is a properties file with three entries, as shown in Listing 2:
Listing 2. SNMPExit properties file
<properties> <comment>SNMP Host Information</comment> <entry key="host">SNMP-MANAGER</entry> <entry key="port">162</entry> <entry key="community">public</entry> </properties>
hostproperty specifies the host name or TCP/IP address of the SNMP Manager.
portproperty specifies the port the SNMP Manager listens on. The default is 162, but most SNMP Managers can be configured to use a different port.
communityproperty specifies the value of the community OID in the trap message. The default is public.
Please discuss the settings of these properties with your SNMP Manager administrator, who will also want the MIB.
This is the Management Information Base for the traps that we send to the SNMP Manager. Your SNMP Manager administrator will need the MIB to help the SNMP Manager system understand the incoming traps.
Files PEThresholdXML and PEEvent.xml
These are sample XML files that can be used to test our program without having to force database errors or use Optim Performance Manager. To do this, run the batch file with stdin redirected to read from one of these files: SNMPExit.bat <PEEvent.xml.
Files snmp4j-1.11.jar and log4j-1.2.14.jar
These are open source libraries for the SNMP API for Java provided by SNMP4J.org, which also provides more information about this powerful set of SNMP APIs. (See Resources for a link to the site.)
Setting up the user exit
We recommend a step-by-step approach to setting up and testing the user exit, as follows:
- Unzip all the files from the package included in the Download section into a directory. You should
see something like Figure 10:
Figure 10. The package unzipped
- Edit the batch file to set
here_pathto the full path of the directory where the unzipped files reside, and set
java_dirto the full path of your Java runtime environment.
SET here_path=C:\Test SET java_dir=C:\Program Files\IBM\Java50\jre
We also suggest you make your first run with tracing on, so add the word
TRACEto the invocation statement:
"%java_dir%"\bin\java.exe -jar SNMPExit.jar TRACE
- Save the bat file.
- From a command prompt, invoke the batch file passing one of the
sample XML files as input:
- If all is correctly configured, the program will execute and
terminate without you seeing any error messages. You can now check the
PESNMPxxxxx.trc file to see the sort of information we write when
Notice that SNMP trap delivery isn't acknowledged or guaranteed, so all we know at this point is that the messages are getting to the network software.
- At this point, it's a good idea to check that the SNMP traps are capable of reaching an SNMP Manager. You can do this by running one of the many free SNMP trap utilities that are available on the Internet. Our favorite is Trap Receiver 7.02. We also did some testing with MG_Soft Trap Ringer. (See Resources for links to both trap utilities.)
If you run one of these tools on the same machine you are testing on, then you don't need to change the defaults in SNMPExit.xml. Run the batch file again and check the trap receiver. You should see a trap arrive, and then you can open the message in your trap receiver. You should see something like Figure 11.
Figure 11. Trap details using Trap Receiver
Now that you know you can receive traps on the local machine, it's a good idea to test receiving them at a remote machine. Deploy your trap receiver software on another machine on the network, change the host entry in SNMPExit.xml to the new machine name, and repeat the test run. If all is well with your network configuration, you will see the trap arrive at the remote machine.
It's now time for you to talk to the administrator of your enterprise SNMP Manager. They will need the MIB, and you will need the host name, port, and community string for the SNMP Manager. Make the appropriate changes in SNMPExit.xml, and run the test again. Your enterprise system should receive the trap.
Now, let's configure Optim Performance Manager to call your program. This is done from the heritage DB2 Performance Expert client, which ships as part of OPM.
- Right-click on the instance, and select Properties
from the context menu, as shown in Figure 12:
Figure 12. Selecting instance properties.
This brings up the dialog shown in Figure 13.
Figure 13. Enabling exception processing
- Click on the Exception tab.
- Select the check boxes to Enable event exception processing, to Enable periodic exception processing, and to Enable user exit.
- Enter the full path and file name to the batch file.
- Click on Test to make sure Optim Performance Manager calls the program successfully. This will call the program with a dummy exception event. You can verify that your SNMP Manager receives this event.
- Click on OK.
- Before starting exception processing you need to define a threshold
set. These are also defined using the Performance Expert client.
Select a DB2 instance, and then select Tools >
Exceptions > Exception
Processing, as illustrated in Figure 14:
Figure 14. Configuring exception processing
From the exception processing window (shown in Figure 15) you can see any existing threshold sets and define new ones.
Figure 15. Defining a new threshold set
- Back on the main screen in the Performance Expert client, right-click
on the instance, and then select Exceptions >
Start > Both to bring up the
exception activation window, as shown in Figure 16:
Figure 16. Start exception processing
- Make sure both exception types are stopped, and then select the
Call User Exit check box for both event and
periodic exceptions, as shown in Figure 17:
Figure 17. Activate exception processing
- Select a threshold set for periodic exception processing, and then click both run buttons (also shown in Figure 17).
- Check that the status field indicates "Running," as shown in Figure 18, and then click on
Figure 18. Exception processing is running
Now you are ready for an end-to-end test, where Optim Performance Manager detects an alert and calls the user exit to get SNMP traps sent to your SNMP Manager. You will need to force one of the threshold conditions to be triggered, resulting in an alert, and then you can check that the SNMP trap arrived at your SNMP manager.
If you haven't already done so, don't forget to delete any trace files and turn off tracing by removing the command line parameter in the batch file.
This article has described a simple program that uses the Optim Performance Manager user exit to carry out custom alert processing when a monitored threshold is exceeded or when a deadlock occurs. Our sample program processes the XML documents from Optim Performance Manager that describe the events and uses that information to create SNMP v2c traps. We use SNMP4J, a popular SNMP API for Java, to allow SNMP traps to be sent to enterprise system management software such as Tivoli Netcool.
- Creates a DOM object from the incoming XML
- Extracts interesting information to send to the SNMP Manager
- Builds an SNMP message
- Sends the message to an SNMP Manager
The first two steps can be used as the basis of other programs to carry out custom processing whenever Optim Performance Manager detects a threshold or event alert. This article should help you build your own alert processing programs to further extend and integrate Optim Performance Manager. We are very interested in hearing from you if you have any comments about this article. Please let us know if you build a custom user exit program of your own.
This article and the accompanying code wouldn't have been possible without the initial research and development carried out by M. G. Runert and F L Huya-Kouadio. A special thanks to Mylène Stolpe-Evras for helping us with our testing using Tivoli Netcool.
|Sample user exit||SNMPExit.zip||1.04MB|
- "What's new in Optim Performance Manager Extended Edition for DB2 for Linux, UNIX, and Windows" (developerWorks, April 2010): Take a tour of the new and enhanced capabilities in Optim Performance Manager 4.1. Understand the key enhancements in DB2 performance monitoring provided by Optim Performance Manager Extended Edition 4.1.
- SNMPLink.org - SNMP Portal: Learn more about SNMP, MIBs, Network Management, and Network Monitoring.
- "Optim Performance Management solution" demo (developerWorks, 2010 April): See how one fictional company uses Optim solutions to resolve problems before they affect the business, to accelerate performance of existing applications using pureQuery client optimization, and to build performance into applications, right from the start.
- "OPM-ITCAM Integration Overview-Part 1" demo (ChannelDB2, April 2010): In this first of a series of 5, Randy Horman, IBM STSM, introduces the capabilities and relationships between both products: IBM Tivoli Composite Application Manager (ITCAM) and IBM Optim Performance Manager Extended Edition.
- Optim Performance Manager Extended Edition: Get more information Optim Performance Manager Extended Edition.
- IBM Optim Performance Manager for DB2 for Linux, UNIX, and Windows announcement letter: Learn more about Optim Performance Manager and how it helps resolve emergent performance problems before they impact the business.
- Integrated Data Management Information Management Center > Optim Performance Manager: Learn more about Optim Performance Manager.
- IBM Tivoli Netcool: Find out more about Tivoli Netcool.
- Package org.w3c.dom: Learn more about processing XML documents using Java and DOM.
- Optim page on developerWorks: Get the resources you need to advance your skills on the Optim family of products.
- developerWorks Information Management zone: Learn more about Information Management. Find technical documentation, how-to articles, education, downloads, product information, and more.
- Stay current with developerWorks technical events and webcasts.
Get products and technologies
- MIB Browser: Get KS_Soft's MIB Browser, which is available from KS-Soft - Network Management Solutions.
- Trap Receiver: Download Trap Receiver.
- MG_Soft Trap Ringer: Download MG_Soft Trap Ringer.
- Build your next development project with IBM trial software, available for download directly from developerWorks.
- Participate in the discussion forum.
- Participate in developerWorks blogs and get involved in the My developerWorks community; with your personal profile and custom home page, you can tailor developerWorks to your interests and interact with other developerWorks users.