IBM Tivoli monitoring for Q Replication

A guide for integrating IBM Websphere Information Integrator Q Replication into IBM Tivoli Monitoring

IBM® Tivoli® Monitoring is a family of products designed to monitor the health and performance of your enterprise applications. This article shows you how to access Q Replication monitoring information, how to bring this data into the Tivoli Platform, and how to use Tivoli alerts and situations so that Q Replication receives notifications when critical events occur.

Share:

Daniel Martin (daniel.martin@iaas.uni-stuttgart.de), Research Associate, University of Stuttgart

Daniel MartinDaniel Martin is a research associate at the University of Stuttgart. He is a member of the Triple Space Communication (Tripcom, www.tripcom.org) project, where his research focus is in the area of messaging, triple space architecture, and Web services. Before going back to university, he worked on the DB2 Performance Expert Team at the IBM Boeblingen Lab.



24 July 2006

Also available in Chinese

Introduction

IBM WebSphere® Information Integrator Q Replication is the number one choice for replicating large volumes of data at very low latency. In a mission-critical environment with replication, you want an easy way to see the overall systems status and get immediate notifications when something goes wrong. This is where IBM Tivoli Monitoring comes into play. It provides a place where all business relevant applications report their current status, allowing administrators to see the overall health condition of their systems.

Since there is no dedicated Q Replication agent for Tivoli Monitoring, this article shows how to access Q Replication monitoring information, how to bring this data into the Tivoli Platform, and how to use alerts and situations to receive notifications when critical events occur.

Figure 1. Apply_Monitor attribute group in the Tivoli Enterprise Portal Client
Figure 1. Apply_Monitor attribute group in the Tivoli Enterprise Portal Client

Figure 1 shows the Tivoli Enterprise Portal Client and graphs that display live data from Q Replication. The graphs are updated automatically so that the latest information is always available. Using the Tivoli Monitoring "Historical Data Collection" feature, it is also possible to show data from any point back in time.

IBM Websphere Information Integration Q Replication

Q Replication mainly consists of two programs:

  • Q Capture
  • Q Apply

The capture program monitors changes on source tables and converts committed transactional data into messages. These messages are sent to the target location through WebSphere MQ message queues, where they are read from the queues and converted back into transactional data by the apply program. The transactions are then applied to the target tables with a highly parallelized method that preserves the integrity of the data. Figure 2 shows a sample configuration for Q Replication.

Figure 2. Sample configuration for Q Replication
Figure 2. Sample configuration for Q Replication

To learn more about Q Replication and SQL replication see "Introduction to Replication and Event Publishing" (GC18-7567-00), available at the DB2 UDB, DB2 Connect and DB2 Information Integrator Version 8 product manuals page.

IBM Tivoli Monitoring

The IBM Tivoli Monitoring platform is a suite of products that monitor and manage system and network applications on a variety of platforms. These products keep track of the availability and performance of all parts of your enterprise from one or more designated workstations, and provide reports you can use to track trends and troubleshoot problems.

The Monitoring platform consists of the following products:

  • The Tivoli Enterprise Portal Client (1): A Java-based user interface for viewing and monitoring your enterprise.
  • The Tivoli Enterprise Portal Server (2): Software components for the client that retrieve, manipulate, and analyze data from the agent programs monitoring your enterprise applications.
  • The Tivoli Enterprise Monitoring Server (3): Acts as a collection and control point for alerts received from the agents. It also collects performance and availability data from the agents and passes it on to the portal server.
  • Tivoli Enterprise Monitoring Agents (4): These are installed on the systems or subsystems you want to monitor. These agents collect and distribute data to a monitoring server. For example, Tivoli Monitoring for Databases contains such agents for Database Products.
Figure 3. IBM Tivoli Monitoring data flow
Figure 3. IBM Tivoli Monitoring data flow

Software prerequisites

This article describes how to set up the Tivoli Universal Agent for Q Replication. It is assumed that Websphere II and Tivoli Monitoring are already installed. For information about setting up these products, see the documents in the Related references section below.

  • IBM Tivoli Monitoring Version 6.1.0
  • IBM Tivoli Universal Agent Version 6.1.1
  • IBM WebSphere Information Integrator Version 8.2

Related references

For more details on planning, configuring, and administering a Q Replication environment, refer to the DB2 Information Center and "Replication and Event Publishing Guide and Reference" (SC18-7568-00), available at the DB2 UDB, DB2 Connect and DB2 Information Integrator Version 8 product manuals page.

You can download all Tivoli related product documentation at the Tivoli Software Information Center. Also, if you need additional information, see the product Web site for the Tivoli Universal Agent.


Accessing Q Replication status information

Each capture and apply program has a corresponding set of control tables. These tables are stored at the same location as the Q Apply and Q Capture programs, and contain configuration values plus the status information you want to access. You can find these tables in the schema you defined as capture and apply schema during your replication setup.
Figure 4 shows the monitor control table from the Q Apply program as an example.

Figure 4. Q Apply status table
Figure 4. Q Apply status table

Since each of the Q Apply and Q Capture programs potentially runs on a different system and the control tables are local to the programs, Tivoli Monitoring needs to connect to each of the systems separately. The following section provides detailed steps on how to do this.

If you want a full list of all Q Replication control tables, and what data they contain, refer to the corresponding chapters in the Replication and Event Publishing Guide and Reference.


Set up the Tivoli Universal Agent

For detailed instructions on how to install the Universal Agent refer to Installing monitoring agents and the documentation from the product CD-ROM. For additional information, refer to the Tivoli Universal Agent User's Guide.

After successful installation of the Universal Agent, follow the steps below to set-up and configure a new instance of the agent that uses the Open Database Connectivity (ODBC) data provider to collect monitoring information.

  1. Open Manage Tivoli Enterprise Monitoring Services.
  2. Right-click on the Universal Agent with Task/Subsystem Primary, and select create instance....
  3. On the screen that opens, enter the name for the new Universal Agent instance, for example, "Q_REPL_AGENT."
  4. Right-click on the new Universal Agent Instance you just created, and select configure using defaults.
  5. Click Yes when asked you want to "update the file KUMENV_Q_REPL_AGENT prior to configuration of the universal agent."
  6. In the text editor that opens, change the line that starts with:
    *-----------------------------------------------------------------*
    * UA Startup automatic start DP options                           * 
    * (ASFS,APIS,FILE,SOCK,HTTP,SNMP,POST,WBEM,ODBC)                  *
    *-----------------------------------------------------------------*
    KUMA_STARTUP_DP=ASFS

    to
    *-----------------------------------------------------------------*
    * UA Startup automatic start DP options                           * 
    * (ASFS,APIS,FILE,SOCK,HTTP,SNMP,POST,WBEM,ODBC)                  *
    *-----------------------------------------------------------------*
    KUMA_STARTUP_DP=ODBC
  7. After closing the text editor, you are asked if you now want to configure the agent. Click Yes.
  8. To check if the configuration was successful, see if the red exclamation mark on the left of the new agent's name has changed to a green circle.

Next, create a meta file for the Universal Agent. This file describes the data you want to monitor for the Tivoli platform and also contains configuration information like datasource name and location for the Universal Agent.

For a quick start, you can download the sample meta file found in the Download section and continue with the section Meta-file adaption below. Save the sample meta file you downloaded under IBM\ITM\TMAITM6\metafiles\.

Meta-file generation

The Universal Agent ships with a command line tool to generate a valid meta file from an existing ODBC datasource. However, the generated file needs some additional work to adapt it to your environment.

Before you can start and use the Universal Agent tool to generate the meta file, you need to set up ODBC data source names (DSN) for the databases where the Q Replication control tables are stored. To create a DSN on a Windows system, go to Administrative Tools, Data Sources (ODBC), create a system DSN using the DB2 ODBC driver, and select the appropriate database that contains the Q Replication control tables for the replication program you want to monitor.

After you created the DSN's, use the KUMPCON tool located at IBM\ITM\TMAITM6. Start it in a console window with KUMPCON GENERATE YourDataSourceName user=YourUserID pswd=YourPassword, then follow the instructions on the screen.

When asked to pattern match on particular table names, answer yes, and use "IBMQREP" as the pattern to match. This way the KUMPCON tool only generates meta data for tables starting with IBMQREP, as all the Q Replication control tables do.

Figure 5 shows sample output from the KUMPCON tool.

Figure 5. KUMPCON sample output
Figure 5. KUMPCON sample output

Meta-file adaption

Tivoli Agent meta files are plain text files and can be edited with any text editor. After running KUMPCON GENERATE, you can find the generated file in the folder IBM\ITM\TMAITM6\metafiles. Since it was generated automatically, it is in a very raw form and needs further tweaks. Here is list of things you might want to change:

  • Data Types: Tivoli Monitoring uses its own type system so that any data from outside must be described with a Tivoli type. The (simplified) syntax for an attribute definition in the meta file is the following:
    attribute-name attribute-type maximum-size [KEY] [ATOMIC] [@help text]

    You can find a list of Tivoli attribute types in Table 1.

    The reason why you need to modify the generated meta file is because KUMPCON GENERATE tends to use very restrictive types and generates more D (DisplayString) and N (DisplayNumeric) typed attributes than necessary. Those attributes cannot be used for bar, pie, or gauge charts in the Tivoli Enterprise Portal Client. These chart-types only accept pure numeric attributes (such as types G, C, A, or #). Character and date attributes can only be displayed in a table. Also, pure numeric types are much more flexible in situation definitions, such as a rule like if x > 10 is not possible when attribute x has a character (display*) - type.

    As a general rule, you should always use a numeric type for attributes you know to contain only numeric values. In this case, where the attribute values come from DB2, you can check the column type from the table definition.

    Table 1. Tivoli data types
    TypeDescription
    SSwitch. Boolean 0 or 1.
    GGauge. Positive or negative integer.
    CCounter. Positive integer.
    AAverage. Data to be averaged over all collections.
    DDisplayString. Series of characters.
    NDisplayNumeric. Series of numeric characters.
    TTime. The format is CYYMMDDHHMMSSmmm (where C=1 for the 21st century).
    #Delta value. Presents the value of the attribute as the difference between samples. For example, if the value for sample 1 is 100 and for sample 2 is 120, the delta is 20.
    %Percentage of change. Presents the value of the attribute as the difference between samples expressed as a percentage. For example, if ReceiveCount is defined as % data type, and the value for sample 1 is 100 and for sample 2 is 120, the percentage of change is 20.
    ?Rate of change. Presents the value of the attribute as the delta per second between samples. For example, if ReceiveCount is defined as ? data type, the value for the first sample is 100, the value for the second sample is 120, and the elapsed time between samples 1 and 2 is 5 seconds, the rate of change is 4 per second.
  • Attribute properties: An attribute can have none, one, or both of the following properties:

    • Key: Indicates that an attribute is a key attribute. Tivoli Monitoring uses key attributes to determine whether multiple events have the same cause. Key attributes help correlate data rows with identical keyed attribute values. When the Universal Agent receives data rows for keyed attribute data, it checks to see if it already has a data row with matching values for keyed attribute. If so, the new data row replaces the existing one. Note: Up to five key attributes per attribute-group are allowed.

    • Atomic Indicates that an attribute is atomic. Atomizing an attribute means that separate events are generated if a single situation on that attribute evaluates to true. For example, if the situation definition is IF mem_usage > 100. The atomized attribute mem_usage would raise an event for every process that has allocated more than 100MB of memory. If more than one process fulfills that condition, each process raises a separate event. A non-atomized version of the mem_usage attribute behaves different. Only one event is raised, even if more than one process fulfills the condition at the same time.

    It is generally a good idea for ODBC meta files to use keyed tables because it prevents the same retrieved rows from being added multiple times whenever the SQL select statement is executed, and most ODBC tables have one or more indexed columns which logically correspond to key attributes in the meta file.

    Note: To be more flexible with how to display the data in the Tivoli Enterprise Portal Client, use the SQL clause in the attribute group definition only to select the latest data. But do the filtering of which attributes to display in the chart definition at the Portal Client. Also, make sure that you understand the structure and the data of the control tables before creating graphs in the portal client. For example, the IBMQREP_APPLYMON table (and therefore the Apply_Monitor attribute group from the sample meta file) contains data for each queue browser thread (one per queue the apply program listens to) of the apply program. You should use the Enterprise Portals filter and grouping features to create a separate graph per receive queue.

    To learn how to create charts in the Tivoli Enterprise Portal Client, refer to the Tivoli Enterprise Portal User Guide. For detailed information about Q Replication control tables, refer to Replication and Event Publishing Guide and Reference.

  • Data sampling Method: You define the data sampling method, along with other attribute group properties in the //NAME statement. Here is its full syntax:

    //NAME attribute-group-name sample-method [time-to-live] [AddSourceName]
    [AddTimeStamp] [Interval=] [SkipNonNumeric=Y/N] [@help text]

    Sample-method can be one of the four below:

    • P: Polled data becomes available periodically and only the latest set of values is available for situation monitoring and reporting.
    • S: Sampled data behaves in the same way as polled data except that more than one set of attribute data values may be available for use.
    • K: Keyed data behaves in the same way as sampled data, but allows you to correlate events. Up to five attributes in each group can be designated as key attributes.
    • E: Event data occurs unpredictably and is reported as it becomes available.

    For ODBC data, K (keyed sampling) is the most appropriate sampling method, since almost every table has at least one key column and the DBMS ensures its semantics.

    Note: You should set the Interval= property on each attribute group in the meta file to the same value as the monitor_interval parameter from the corresponding Q Apply and Q Capture programs. You get this value by issuing asnqccmd capture_server=YourDB2Alias capture_schema=YourApplySchema qryparms and asnqacmd apply_server=YourDB2Alias apply_schema=YourApplySchema qryparms in a command line window.

You might want to change application and attribute group names as well. Also, delete all attribute groups and attributes you don't need. This saves network traffic and ensures proper response times for the Tivoli Enterprise Portal Client.

Listing 1. Apply_Monitor Attribute Group
//NAME Apply_Monitor K 300 Interval=20
//SOURCE ODBC DB2_TARGET user=***** pswd=*****
//SQL SELECT * FROM APPLY221222.IBMQREP_APPLYMON
order by monitor_time DESC fetch first 100 rows only
//ATTRIBUTES
MONITOR_TIME        D  28 KEY ATOMIC
RECVQ               D  48
QSTART_TIME         D  28
CURRENT_MEMORY      C  999999
QDEPTH              C  999999
END2END_LATENCY     C  999999
QLATENCY            C  999999
APPLY_LATENCY       C  999999
TRANS_APPLIED       C  999999
ROWS_APPLIED        C  999999
TRANS_SERIALIZED    C  999999
             ...

When using the sample meta file from this article's Download section, make sure that you modify it to fit your systems ODBC DSNs, control table schema, application, and attribute group names. You should also check if you need to modify the SQL statements from the metafiles //SQL definition. Also, make sure to set the Interval= statement from each attribute group to the correct value as described in the note above.

When you are done with editing the meta file, make sure to validate its syntax with KUMPCON VALIDATE metafile-name in a command line window.

Start your Universal Agent instance

Before staring your Universal Agent instance, you need to tell it where to find the meta file it should use. This is done by a configuration file specific for the Universal Agent instance you created in the previous steps. In a default installation, you can find this file in at IBM\ITM\TMAITM6\work\KUMPCNFG_[$YOUR_AGENTS_INSTANCE_NAME] (for example, IBM\ITM\TMAITM6\work\KUMPCNFG_Q_REPL_AGENT) where [$YOUR_AGENT_INSTANCE_NAME] stands for the name you gave to the new agent instance in Set up the Tivoli Universal Agent. If this file doesn't exist, simply create an empty text file with that name, open it in a text editor, and add a new line with the meta file name for every meta file you want the agent to load at startup. Also, make sure that your KUM_WORK_PATH and KUMP_INIT_CONFIG_PATH environment variables are set to correct values.

If the Universal Agent is already running and you want to import a new meta file without stopping the agent, you can use KUMPCON IMPORT metafile-name to activate a new meta file. Also, if you want to update an already active meta file without service interruption, use KUMPCON REFRESH metafile-name.

You can now start your new Universal Agent instance by right-clicking Manage Tivoli Enterprise Monitoring Services, then selecting Start in the context menu. When you open the Portal Client, you should see a new application being monitored by the universal data provider and that this application contains a separate workspace for each of your attribute groups from the meta file. Figure 1 shows the Apply_Monitor attribute group from the sample meta file that uses different graphs to visualize the current status of the replication.

To find out more about KUMPCON, meta files and the Universal Agent configuration, refer to the Universal Agent User Guide.

Troubleshooting

A first place to look if something is wrong, is the Universal Agents log directory located at IBM\ITM\TMAITM6\logs. If the Universal Agent is running but doesn't behave as desired, you should start the Tivoli Enterprise Portal Client and have a look at the Data Provider Log (DPLOG) workspace of the Universal Agent instance that is in doubt. The DPLOG workspace is similar to a system console log. It provides a detailed audit trail from the data provider.

If your agent instance starts, but doesn't show up at the Portal Client, you should also check if the Universal Agent configuration has the correct IP address or hostname for the primary monitoring server.

You can configure this value under Manage Tivoli Enterprise Monitoring Services by right-clicking on the Universal Agent instance, then click Advanced, click Configure advanced..., then click OK. The window that opens contains the agents connection settings for the primary monitoring server.


Situations and alerts for Q Replication

A situation is a logical expression involving one or more attributes from an attribute group defined in a meta file. Situations are used to monitor the condition and health of systems in your network. They can trigger executables on the system that caused them to fire or simply notify administrators that a certain event occurred. You can manage situations from the Portal Client with the Situation Editor.

Each dedicated agent comes with a set of predefined situations. You may activate them as they are, or take them as a starting point for your own set of situations. The Universal Agent - as we use it for monitoring Q Replication, doesn't come with predefined situations because of its generic nature. We therefore want to provide a list of situation definitions you can take as "inspiration" for your own situations for Q Replication. Also, the Formula column from the table below is a good start for the "Threshold" feature of table views in the Portal Client.

Table 2. Situation examples for Q Replication
NameDescriptionAttribute GroupFormula
APPLY_E2E_LATENCY_LIMITIf you want to be notified, or execute a command, if replication end-to-end latency for ANY queue from the monitored apply process reached a certain value (here: >10s).Apply_Monitor( END2END LATENCY > 10000 )
APPLY_DETECT_SPILLINGIf you want to be notified, or execute a command, if the apply program needed to spill a row to the spill queue in Websphere MQ.Apply_Spilled_Rows( SPILLQ != XXX )
APPLY_QDEPTH_LIMITIf you want to be notified, or execute a command, if there are too many messages waiting in ANY of the queues from the monitored apply program.Apply_Monitor( QDEPTH > 50000 )
APPLY_DETECT_MONSTER_TXIf you want to be notified, or execute a command, if one or more transactions on ANY apply queue exceeded the apply program's memory limit.Apply_Monitor( MONSTER TRANS != 0 )
APPLY_DETECT_MEM_FULLIf you want to be notified, or execute a command, if the apply program could not process transactions from ANY receive queue because agents were using all available memory.Apply_Monitor( MEM FULL TIME != 0 )
APPLY_WARNINGIf you want to be notified, or execute a command, if the apply program reported a warning message.Apply_Trace_Log( SCAN(OPERATION) == WARNING )
APPLY_ERRORIf you want to be notified, or execute a command, if the apply program reported an error message.Apply_Trace_Log( SCAN(OPERATION) == ERROR )
CAPTURE_TX_SPILLEDIf you want to be notified, or execute a command, if the capture program spilled transactions to disk or virtual I/O.Capture_Monitor( TRANS SPILLED != 0 )
CAPTURE_WARNINGIf you want to be notified, or execute a command, if the capture program reported a warning message.Capture_Trace_Log( SCAN(OPERATION) == WARNING )
CAPTURE_ERRORIf you want to be notified, or execute a command, if the capture program reported an error message.Capture_Trace_Log( SCAN(OPERATION) == ERROR )
QREP_HOTLIST_OFFLINEIf you want to be notified, or execute a command, if one the the Q Replication processes isn't running any more.NT Process( MISSING (ProcessName) == ('asnqcap.exe','asnqapp.exe') )

When editing the situation, make sure to select an appropriate sampling interval for the situation under the Condition tab. You should also adjust the settings under the Action tab of your situation definition. Especially parameters, like "If the condition is true for more than one monitored item." or "If the condition stays true over multiple intervals." need to be set to the correct value.

The real power of the Tivoli Monitoring system becomes visible when you create situations that are triggered from different attribute groups that come from different agents. This is very useful for Q Replication because this product uses other software to do parts of its work. Q Replication runs on top of an operating system, uses Websphere MQ to transport messages and DB2 to capture and apply transactions. It is clear that proper operation of Q Replication heavily depends on the health of its underlying systems.

The Tivoli Monitoring product family already has dedicated agents for Websphere MQ, DB2, and all major operating systems. You should install them as well and create situations or workflow definitions that span across the whole software stack Q Replication uses. For example, do simple root cause analysis: If Q Replication shows errors and the Websphere MQ Agent tells Tivoli Monitoring that the Admin Queue is down, generate a message that Replication has stopped because there is a WebSphere MQ problem.

Note: You can only create situations that use attributes from the same attribute group. If you want to use attributes from different groups in your situation, you must create another situation and embed it in the first. Alternatively, you can create a new attribute group that contains all the attributes you want to include in the situation. You can also use the Workflow Editor to assemble situations.

For more information about situations and the Portal Client, refer to the Tivoli Enterprise Portal User Guide.


Other tools to monitor Q Replication

When talking about monitoring Q Replication, it should also be mentioned that there are other tools available. First, there is the Replication Alert Monitor that ships with the Q Replication product and is available from the Replication Control Center.

Also, there is the "Q Replication Live Monitor" (developerWorks, September 2005), a small, lightweight tool that graphically displays real-time latency and throughput information available at developerworks.

A third tool, quite similar to the live monitor but with a lot more features, is the Q Replication Dashboard, available through the Q Replication Tools page.


Conclusion

This article showed you how to monitor and integrate IBM Websphere Information Integrator Q Replication into the IBM Tivoli monitoring platform. Q Replication status information is available through control tables on each of the systems where the Q Apply and Q Capture processes run. You connect to these control tables by creating a meta file for the Universal Agents ODBC data provider. The meta file groups the data to read into attribute groups and describes each attribute with an Tivoli specific data type. Once the Universal Agent is configured and running, you can use the Tivoli Enterprise Portal client to create and distribute situation definitions that notify you (or even execute a custom command) if a critical event occurred.


Download

DescriptionNameSize
Sample metafilesample.mdl.zip2KB

Resources

Learn

Get products and technologies

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management, Tivoli, WebSphere
ArticleID=143498
ArticleTitle=IBM Tivoli monitoring for Q Replication
publish-date=07242006