Health Checking and Troubleshooting for IBM Rational ClearCase MultiSite Replica Synchronization using Global Monitor

The IBM® Rational® ClearCase MultiSite® Global Monitor feature, released in ClearCase Version 7.1, is a powerful monitoring tool for ClearCase and ClearCase MultiSite administrators. One of the advantages of the Global Monitor feature is its alert system, which can be configured to send you a notification (such as an e-mail or a text message) when your ClearCase deployment has various problems. This article discusses how well the Global Monitor feature can be applied to a ClearCase MultiSite deployment to monitor replica synchronization. It also discusses a new ClearCase MultiSite feature in V7.1.1 for troubleshooting synchronization issues.


Ken Kumagai (, Software Engineer, IBM

author photoKen Kumagai is an IBM software developer for the IBM Rational ClearCase MultiSite team. He works in the Software Development Laboratory in Yamato (YSL), Japan. One of his current interests is how existing systems can work seamlessly with IBM Rational Team Concert and collaboration tools, which are based on IBM Rational Jazz technology by using the Open Services for Lifecycle Collaboration (OSLC). In his spare time, Ken enjoys reading books at the nearest coffee shop.

Yoshio Horiuchi (, Software Engineer, IBM

author photoYoshio Horiuchi is a software engineer in the Software Development Laboratory in Japan. He currently works on IBM Rational ClearCase Global Monitor.

Masabumi Koinuma (, Software Engineer, IBM

author photoMasabumi Koinuma is a software engineer in the IBM Rational ClearCase development group, and he specializes in ClearCase MultiSite and the Global Monitor tool.

Margaret Marynowski (, Software Engineer, IBM

author photoMargaret Marynowski is a software engineer in the IBM Rational ClearCase development group. She currently works on ClearCase MultiSite and the Global Monitor tool.

02 June 2010

Also available in Chinese

What is Global Monitor?

The IBM® Rational® ClearCase MultiSite® Global Monitor feature is a monitoring tool that was released as part of ClearCase MultiSite Version 7.1 (for more information on this product, see the Resources section). It enables you to see your global ClearCase deployment from a single view point in a Web-based interface. It also provides a single method of notification, as well as customizable thresholds. For example, you can receive a text message on your cell phone when the system meets your customized thresholds. Global Monitor also provides notification for several immediately available events, such as the IBM® Rational® ClearCase® ALBD (Atria Location Broker Daemon) server going down, or a scheduled job failure.

What can you find in the Global Monitor console? You can find Rational ClearCase version numbers, VOBs (Versioned Object Bases), Views, scheduled jobs, ClearCase service logs, and more! As a ClearCase MultiSite administrator, you can also find replicas, feature levels of replicas, epoch numbers, and synchronization packets in incoming and outgoing bays. Again, you can configure the monitoring system to be notified by specific conditions on the monitored data. Figure 1. is a screen capture of Global Monitor console that shows replica synchronization packets in the incoming and outgoing bays.

Figure 1. Replica synchronization packets
Replica Synchronization Packets

You can find the path to the packet file, the type of packet (incoming or outgoing ), and age of the packet (how long the packet has been in the bay). When you navigate to the detailed view of a packet by clicking the blue link button at the left, you find more details about the packet, such as originating VOB tag, replica name, oplog IDs, or the packet fragment number.

The user interface (UI) provides a logical navigation tree of your global Rational ClearCase deployment. ClearCase hosts are grouped by ClearCase MultiSite site and ClearCase region. When the system finds an issue in your ClearCase deployment, it also provides context-sensitive help documents called Expert Advice. You can easily come up with possible solutions by reading the help text. The Global Monitor tool provides you with flexible deployment options, sufficient to pass through firewalls with limited data flow on one or two ports. In addition, it supports vendor reporting tools such as open source BIRT (Eclipse Business Intelligence and Reporting) projects or the IBM® Tivoli® Common Reporting tool.

The Global Monitor feature uses the IBM® Tivoli® Monitoring tool (also called ITM), which is a market-leading enterprise monitoring product (for more information on this product, see the Resources section). IBM Tivoli Monitoring is bundled with ClearCase MultiSite. Global Monitor's centralized user interface console is also provided by IBM Tivoli Monitoring, and it is called IBM® Tivoli® Enterprise Portal, or TEP.

Leveraging ITM situations

One of the great features of the Global Monitor is its alert system which is called the ITM situation (for more information on this feature, see the Resources section). This document describes how to apply the ITM situations to check the status of the MultiSite synchronization of your deployment, but let's take a moment to learn what the ITM situation is and how to use it. If you are already familiar with the ITM situation, you can skip this section.

What is an ITM situation?

The event notification system of IBM Tivoli Monitoring is called a situation. It is one of the powerful features that ITM provides, and it is highly customizable. You can define your formula to trigger a situation event. You can create a situation formula on any monitored data, and it will be evaluated at specific intervals, which you can also customize. When a situation event fires, you can easily find it in the centralized console. Also, you can associate a script with a situation that is executed when the event fires. For example, an e-mail notification or text message on your cell phone can be sent by a script that is executed when a situation event fires.

The Global Monitor has the following predefined situations shown in Table 1:

Table 1. Global Monitor predefined situations
NameAuto startDescription
KRC_albddownyesThe albd_server on this host is down
KRC_f_level_mismatchnoThe replica feature level is not supported on this host
KRC_failed_jobnoA ClearCase scheduled job has failed
KRC_family_f_level_lownoThe family feature level can be raised
KRC_family_f_level_too_highnoThe family and replica feature levels are not supported on this host
KRC_inbay_too_longnoA packet has been in the shipping bay longer than expected
KRC_pool_spacenoThe device hosting pool space has filled up beyond the configured limit
KRC_replica_f_level_lownoThe replica feature level can be raised
KRC_replica_f_level_too_highnoThe replica feature level is not supported on this host
KRC_replica_f_level_unknownnoThe replica feature level is unknown
KRC_rollback_criticalnoEpoch rollback detected: the VOB is not locked and not undergoing restorereplica
KRC_rollback_infonoThe replica is being restored with restorereplica
KRC_rollback_warningnoEpoch rollback detected: the VOB is locked, but not undergoing restorereplica
KRC_shipping_bay_spacenoThe device hosting bay space has filled up beyond the configured limit
KRC_updateryesAn internal situation to run the Global Monitor cache updater
KRC_updater_logyesThe Global Monitor cache updater has failed
KRC_view_space_lownoThe device hosting view space has filled up beyond the configured limit

As you can see in the Auto start column, many of these are not enabled by default, which leaves plenty of room for customization. If you install the Operating System agent on your monitored host, you can also find many situations provided by default. In addition, you can define customized situations based on either ClearCase monitored data or Operating System level data.

When you configure a situation, you will find a panel like that shown in Figure 2:

Figure 2. KRC_albddown situation
The Situation Editor - KRC_albddown

The panel consists of 5 tabs at the top. The first, the Formula tab, provides a user interface to edit your formula for the event. You will see how to edit a formula in the next section. You can also edit the sampling interval in this tab.

The Distribution tab is where you configure the host or agent on which you would like to evaluate a situation. By default, all Global Monitor agents are selected.

The Expert Advice tab is where you provide text or a link to a help document. The Global Monitor system will display the help content when you navigate to the detailed panel of a situation.

On the Action tab (shown in Figure 3), you can specify a script or operating system command to be executed when the situation fires.

Figure 3. The Action tab of the KRC_albddown situation
The Situation Editor - KRC_albddown Action panel

The monitored agent data can be included in the argument of the system command. The script can be executed either on the monitoring server or on the agent host. A typical example is to enter a command in the System Command field to send an e-mail or text message. You can add arguments in the command field to provide details of the system status, such as the host name of the machine that the events fired on, the event's severity, and so on.

On the Until tab, you can customize when to close a situation event.

How to define and run situations

You can manage situations using the Situation Editor. First, let's look at a predefined situation, KRC_view_space_low, to examine its logical expression. On the Tivoli Enterprise Portal, type Ctrl+E or click the icon composed of blue and red dots in the toolbar shown in Figure 4 to open the Situation Editor.

Figure 4. Open the Situation Editor
Situation Editor button in the toolbar

The Situation Editor is displayed. Expand the ClearCase node in the navigator pane. You will see the predefined situations for the Global Monitor tool. Select the KRC_view_space_low situation, as shown in Figure 5.

Figure 5. The Situation Editor
Situation Editor window

The right frame of the Situation Editor shows tabbed editing areas (Formula, Distribution, and so on). You can see the logical expression, Usage Percentage> 95, in the Formula tab. The Usage Percentage indicates the Usage_Percentage attribute of the KRC_VIEW_SPACE query, and it actually means storage space which is obtained from the cleartool space -view command. This formula specifies that the KRC_view_space_low will be fired if the view storage usage exceeds 95%.

Now let's define a sample situation and run it. The Global Monitor tool has a predefined situation that alerts you when the view storage is running low, but there is no such situation for VOB storage space, so you can create a vob_space_low situation as an example.

Open the Situation Editor, and then click the Create new Situation button shown in Figure 6.

Figure 6. Create new situation
Create new situation button

In the Create Situation dialog box, set Name to vob_space_low, and Monitored Application to ClearCase, as shown in Figure 7.

Figure 7. Create vob_space_low situation
Create Situation dialog

Click OK to close the dialog. You now will see the Select condition dialog. Because the new situation is for monitoring VOB space, set Attribute Group to KRC_VOB_SPACE and select the Percent Used Attribute Item, as shown in Figure 8. The KRC_VOB_SPACE query obtains value from the cleartool space -vob command, like the KRC_VIEW_SPACE query.

Figure 8. Select the condition for vob_space_low situation
Select Condition dialog

In the Formula tab of the Situation Editor, set v (Value of expression), > (Greater than) and 95, respectively, as shown in Figure 9. You can also change the sampling interval here. The default value of the sampling interval is 15 minutes, but if you feel that is not appropriate for the new situation, just change it.

Figure 9. Definition of vob_space_low situation : Formula
Formula for vob_space_low situation

Open the Distribution tab and set Assigned to *CLEARCASE, as shown in Figure 10. The *CLEARCASE item represents all of the Global Monitor agents that are connected to the ITM system. You can choose specific agents if you would like to evaluate the situation on specific hosts.

Figure 10. Definition of vob_space_low situation : Distribution
Distribution for vob_space_low situation

Click the OK button to close the Situation Editor. The definition of the new situation, vob_space_low, is now completed. Now you need to associate the vob_space_low situation with a navigator node. In the Navigator view, select a navigator node, ClearCase > VOBs, right-click, and select the Situations menu item. You will see the Situation Editor that only displays Situations for - VOBs. Click the Set Situation filter criteria button, as shown in Figure 11.

Figure 11. Set situation filter criteria
Set Situation filter criteria button

The Show Situations dialog is displayed. Select the Eligible for Association check-box, and then click OK. All of the situations in the Global Monitor tool, including vob_space_low, are now listed. Select vob_space_low, and set State to Warning.

The setting for the new situation, vob_space_low, has been completed. When the specified conditions are matched (that is, KRC_VOB_SPACE.Percent_Used > 95), the situation is fired, as shown in Figure 12. In this example, lots of elements are checked in to the VOB, so pool space exceeds 95%. Note that you may need to run the space command (shown in Listing 1) or execute the Daily VOB Space scheduled job on the ClearCase host to report correct storage usage.

Listing 1. Command line to update VOB space
> cleartool space -gen -vob {vob_tag}
Figure 12. The vob_space_low situation is fired
Popup window for vob_space_low situation

You can confirm that the new situation is actually started by using the Manage Situation dialog. Select the ClearCase node in the Navigator view, right-click, and then select the Manage Situations menu item. You see the Manage Situations dialog, as shown in Figure 13, and can verify that vob_space_low is currently started and opened.

Figure 13. vob_space_low situation on Manage Situations dialog
Situations on Manage Situations dialog

Monitoring incoming bays

Now that you are familiar with the ITM situation, you can learn about how to monitor your ClearCase MultiSite synchronization. How do you know if your synchronization is in trouble? The most common symptom of a synchronization error is that the packet files are clogged in a shipping bay. The Global Monitor system collects packet file information so that you can monitor your synchronization status in your shipping bays.

The problem

Update packets are accumulating in an incoming shipping bay.

Why it happens

The problem is that synchronization update packets for a particular replicated VOB are accumulating in a shipping bay, and are not being imported. In general, this is caused by a syncreplica import failure. For example, if a packet has been lost in transit to the target host, subsequent packets will fail to be imported because they depend on changes (oplogs) that the target importer has not yet received. An import can also fail if the VOB is locked, and packets would again accumulate in the incoming shipping bay.

How to detect synchronization problems using Global Monitor

Enable the Global Monitor KRC_inbay_too_long situation, and customize the threshold to a value that makes sense for your business. The Global Monitor system collects all of the packet information, and also detects how long the packet file sits in a shipping bay. If you display the Family Health workspace (select Workspace > Family Health, as shown in Figure 14), you can find all of the packet files and durations at the shipping bay (as shown in Figure 1).

Figure 14. Family_Health workspace on ClearCase < MultiSite node
MultiSite Family_Health workspace

For example, if your scheduled syncreplica import job runs every 10 minutes, you may want to detect packets that remain in the bay longer than an hour. If the situation fires, you can use the Global Monitor feature to confirm failure of a multitool syncreplica -import command. Perform the following steps to enable the KRC_inbay_too_long situation:

  1. Open the Tivoli Enterprise Portal client and log in to the monitoring system.
  2. In the Navigator view, select a navigator node, ClearCase.
  3. Right-click it and select Manage Situations.
  4. In the Manage Situation at Managed System:<hostname> dialog box, right-click KRC_inbay_too_long and select Edit Situation to customize the formula of the situation, or the action when it fires (Figure 15).
  5. Click OK to close the Situations for - Situation dialog box when you finish the customization.
  6. To start the situation, select KRC_inbay_too_long, right-click, and select Start Situation (Figure 16).
  7. Close the Manage Situation at Managed System:<hostname> dialog.
Figure 15. Customize KRC_inbay_too_long situation
The situation for inbay too long
Figure 16. KRC_inbay_too_long situation on Manage Situation dialog
MultiSite situation KRC_inbay_too_long

How to fix it

You must correct the issue that has caused the initial import failure. For instance, if you determine that the VOB was locked, you must unlock it.

If you determine that a packet was lost in transit, you can detect the missing oplogs at the importing host, and then create a packet that only includes missing oplogs at the exporting host. There are 3 command line switches that are added to the syncreplica command in V7.1.1.

Table 2. New command line switches of syncreplica
Switch nameTypeExplanation
-oprangeExportYou can specify a range of oplogs to export a packet. When you find one packet is missing for some reason, but all of the subsequent packets are transmitted successfully, you can create the missing packet again by running an export command with this -oprange switch.
-endrangeExportThis switch will add the end oplog information to the export packet. The end oplog information is displayed in the lspacket command output, and is used by the following -diagnose switch.
-diagnoseImportWhen you add this switch to import packets, packets are actually not imported but are surveyed to determine if there is a gap of oplogs that would cause an import failure. If it detects a gap, it outputs a message about the gap like that shown in Listing 2.

To detect the missing oplogs, you can run multitool syncreplica -import -diagnose at the failing importer. Note that -diagnose parses only packets that have been created using the -endrange switch.

Listing 2. Sample command line to diagnose missing packet
> multitool syncreplica -import -diagnose -receive
Suggested Export Replica "original@/vobs/testRep1" multitool syncreplica -export -endrang
e -oprange original=1523:1524 testRep1

Create and send the packet again by running the multitool syncreplica -endrange -oprange ranges_suggested_by_-diagnose command at the suggested exporter, as shown in Listing 3.

Listing 3. Sample command line to create a packet that includes specific oplogs
> multitool syncreplica -export -endrange -ship -oprange original=1523:1524 testRep1@/vobs
Generating synchronization packet /var/adm/rational/clearcase/shipping/ms_ship/outgoing/sy
Shipping order "/var/adm/rational/clearcase/shipping/ms_ship/outgoing/sync_original_2010-0
1-05T175206-0500_5288" generated.

At the importer, run the multitool syncreplica -import command again.

Monitoring scheduled jobs

Another possible source of ClearCase MultiSite synchronization failure comes from your export or import scripts, which are executed by the ClearCase scheduler. Yes, the Global Monitor system collects information about scheduled jobs so that you can be alerted by any job failure. For example, the predefined KRC_failed_job situation notifies you if any of the scheduled jobs finished with the error code. This section explains how to create the new situations, especially those to detect problems in replica synchronizations.

How can the Global Monitor feature detect problems with scheduled jobs?

Commands for replica synchronizations are typically executed by running jobs. ClearCase provides some predefined jobs (for example, "Daily MultiSite Export", shown in Figure 17. Job ID 12). This specific job is provided for the exporting phase of the syncreplica command, and it is the job that creates and sends packets to the shipping server at the receiving replica. For the importing phase, "Daily MultiSite Receive" (Figure 17. Job ID 14) is the job that receives packets sent from the originating server. You can schedule these jobs to run periodically (or run just once) by using ClearCase administrator console. If jobs do NOT run as scheduled, replicas are left unsynchronized in a VOB family. Therefore, it's important for ClearCase administrators to detect whether jobs run correctly or not.

Figure 17. Predefined scheduled jobs (ClearCase)
The list of predefined scheduled jobs

Global Monitor collects values of job properties on each ClearCase host, and provides functions to create situations made up from those job properties. To create situations for jobs, you need to perform the following steps:

  1. Open the Tivoli Enterprise Portal client and log in to the monitoring system.
  2. In the Navigator view, select a navigator node, (in this example, the Jobs node under the ClearCase node).
  3. Right-click and select Situations.
  4. In the Situations for - Jobs dialog, right-click the ClearCase node, and select Create New.
  5. Input a name in the Name field, and click OK.
  6. With KRC_JOBS highlighted, select attributes as you like from the Attribute Item list, and click OK (as shown in Figure 18.).
  7. Set conditions for each attribute selected at the previous step.
Figure 18. Select attribute group and items.
Job attributes

Table 3. shows attributes of KRC_JOBS in details. Situations for jobs are created from the combination of these attributes. Note that the attributes Last Finished Timestamp, Last Started Timestamp, and Running Started Timestamp have been added since Version 7.1.1.

Table 3. Attributes in details.
Attribute nameExplanationType
IDThis attribute is the id for the job created by ClearCase.Text
Job DescriptionThis attribute explains what the job is like.Text
Job NameThis attribute is the name for the job created by ClearCase.Text
Last Finished/Last Finished TimestampThese attributes are the time when the last job finished. If no jobs have been executed, these attributes are left blank.Text/Timestamp
Last Started/Last Started TimestampThese attributes are the time when the last job started. If no jobs have been executed, these attributes are left blank.Text/Timestamp
NodeThe format of this attribute is <hostname>:<agent code>.Text
Running Started TimestampThis attribute is the time when the current running job started. It is only set while the job is running.Timestamp
StatusThis attribute is the returned state of the job.Text
TimestampThis attribute is the time when ITM collected the data from the agent.Timestamp

Misconfiguration of scheduled jobs

Sometimes scheduled jobs are stopped for some other troubleshooting (like restoring a VOB), and you may forget to restart them. In this case, those jobs would not have run for a long time, so you can use the Last Finished Timestamp attribute in the situation formula because the value of the Last Finished Timestamp attribute reflects the time when those jobs last finished running.

Suppose that exporting and importing jobs are executed once an hour. Then the KRC_not_run_job situation should be fired when the value of Local_Time.Timestamp exceeds the value of Last Finished Timestamp by about one hour plus the typical time of executing the jobs. (Local_Time.Timestamp is the local time when ITM collected the data from the agent.)

These are the steps that you should follow to create the situation KRC_not_run_job, as shown in Figures 19 and 20.

  1. In the Select condition dialog (shown in Figure 18)
    • Select KRC JOBS from Attribute Group.
    • Select ID and Last Finished Time (clicking with Ctrl+C) from Attribute Item.
  2. Click OK.
  3. Select the first cell of the ID column in the Formula section, and input target Job ID.
    • Set the value to 12 to identify syncreplica export job.
    • Set the value to 14 to identify syncreplica import job.
  4. Select the next cell in the Last Finished Time column in the Formula section.
    1. Click the left icon, and select Compare Time to a time + or - delta
    2. In the Select Time Comparison Criteria dialog, select Local_Time.Timestamp from Time Attribute for Comparison, and -, 70 (this value should be customized according to the interval and the typical time of executing jobs on your environment), Minutes from Time Delta. The formula will be: Last Finished Timestamp < Local_Time.Timestamp - (interval and execution time).
      • The time difference should be considered if ITM runs at a different time location at each agent.
      • The formula including the time difference will be: Last Finished Timestamp < Local_Time.Timestamp - (interval and execution time) - {(ITM time zone) - (agent time zone)} .
      • For example, when ITM and the agent run in UTC+9 and UTC-5 zone respectively, the added part in the formula including the time difference is calculated as: - {9h - (-5h)} = -14h.
    3. Click OK.
    4. Click the middle icon, and select Less than.
  5. Set 10 minutes as the value in the Sampling interval.
  6. Click OK.
Figure 19. Create KRC_not_run_job at the exporting host
The situation for not running job
Figure 20. KRC_not_run_job fires at the exporting host when the job is stopped incautiously
The situation is fired.

The KRC_not_run_job situation can be applied to detect the case of moving VOBs (for more information on this, see the Resources section). VOBs are sometimes moved to a new machine with Job histories, as well as some other registry information. Before moving VOBs, scheduled jobs for synchronization must be stopped, but sometimes you may forget to restart them after the move is finished. By applying the situation to the host to which the VOB is being moved before moving VOBs, The KRC_not_run_job situation will fire if you have forgotten to restart the scheduled jobs, as shown in Figure 21.

Figure 21. KRC_not_run_job fires at the host to which the VOB is being moved
Apply the situation to moving VOBs

Blocked by the previous job

Sometimes it takes more time to export or import a large number of oplogs than the interval between the previous and next sessions of a scheduled syncreplica job, and the currently running session of the job will block the next session of the same job. For example, packets which arrive during a long-running instance of job 14 are imported at its next running, not during the current one.

In this case, the job has been running for a long time, and the value of the Running Started Timestamp attributes remains the same as when the job started running. If the exporting and importing syncreplica jobs are executed once an hour, the KRC_long_run_job situation should be fired when the value of Local_Time.Timestamp exceeds the value of the Running Started Timestamp attribute by about one hour, as shown in Figures 22 and 23.

The steps to create the situation KRC_long_run_job are the same as for the KRC_not_run_job situation, except for selecting the Running Started Timestamp attribute, not the Last Finished Timestamp attribute.

Figure 22. Create the KRC_long_run_job situation on a host
The situation for long running job
Figure 23. The KRC_long_run_job situation fires on the host when the syncreplica import job runs too long
The situation is fired.

What you have learned

The ClearCase MultiSite Global Monitor feature allows you to see global ClearCase deployment from a single view point in a Web-based interface. Global Monitor uses IBM Tivoli Monitoring (ITM) to provide customizable thresholds to monitor generic events (for example, the ALBD server going down, a scheduled job failure, and so on), and to provide a single method of notification. As examples of typical use cases, this article described how to leverage ITM situations to monitor specific conditions (for example, running low on space on the VOB storage device). This article also explained how well the Global Monitor feature can be applied to ClearCase MultiSite deployment to monitor typical issues of replica synchronization. Timely and appropriate information provided by the Global Monitor feature helps you take actions quickly to recover from ClearCase and ClearCase MultiSite issues.


The authors would like to thank Takehiko Amano for technical advice.



Get products and technologies



developerWorks: Sign in

Required fields are indicated with an asterisk (*).

Need an IBM ID?
Forgot your IBM ID?

Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.


All information submitted is secure.

Dig deeper into Rational software on developerWorks

ArticleTitle=Health Checking and Troubleshooting for IBM Rational ClearCase MultiSite Replica Synchronization using Global Monitor