IBM Support

About the restorereplica process

Question & Answer


Question

What information is available to explain what happens during the IBM® Rational® ClearCase® MultiSite® restorereplica process?

Answer

Overview

The restorereplica process ensures a replica retrieves its locally generated oplogs by retrieving them from sibling replicas by using special purpose oplogs.


restorereplica

A restorereplica oplog is generated by the restorereplica command. It contains the new "incarnation date" of the replica and the oplog row of the replica at that date/time. The restorereplica oplog uses a special oplog number (0xFFFFFFFF) so that it does not increment the "next oplog number". Any syncreplica packet generated at this time shall contain this and only this oplog.

The restorereplica command also sets flags in the replica objects to track the status of the restore process:

    • multitool lsreplica -long for the local replica shall show replica is recovering
    • multitool lsreplica -long for other replicas specified on the restorereplica command will report replica is expected to send recovery update
    • By default, all replicas get this awaiting reply flag set

      Note: These flags do not get propagated to other sites.

ackrestorereplica

When a sibling replica receives the restorereplica oplog it stores it and automatically generates an ackrestorereplica.

At this point any syncreplica packet generated for any sibling replica shall contain the restorereplica oplog, regardless whether it has been previously sent. Packets for the recovering replica shall contain all known oplogs since those indicated in the restorereplica, but the epoch table row for the recovering replica shall NOT be updated to reflect they were sent. Running syncreplica -export twice, the second packet shall contain just as much data as the first.

If a replica receives a restorereplica oplog, but already has one for the generating replica then:

    • if the incarnation date is older than that of the stored oplog the import will fail as for any "stale" packet
    • if the incarnation date is the same as that of the stored oplog the oplog will be ignored
    • if the incarnation date is newer than that of the stored oplog the it will replace the stored one and a new ackrestorereplica will be generated


You can determine which replica still needs to send updates by running multitool lsreplica -long at the recovering replica.

The replicas that are still required to send updates will contain the following line in the output:

replica is expected to send recovery update


restore complete
  1. When the recovering replica replays an ackrestorereplica it resets the "awaiting reply" flag for the replica that generated the acknowledgement.
  2. When all sibling replicas have their "awaiting reply" flags reset then the recovering replica generates a "restore complete" oplog and resets its own replica_recovery flag.
  3. When a sibling replica receives a "restore complete" oplog it checks for a restorereplica oplog that matches on replica and incarnation date and removes it. It then updates the epoch table row for that replica to reflect what the replica now knows. It also stops propagating the restorereplica oplog at that time.



Additional considerations using the -override or -replace options to optimize the restoration process

Both of these options set the recovering replica to only require updates from a subset of replicas in the family. Synchronization using the hub model or one way synchronization for some replicas are examples of why these options might be used.

Because the restorereplica oplog is deleted when the operation is completed, there is the potential to create a stale restorereplica oplog and or cause incarnation errors when attempting to import sync packets into the restored replica even after the restore operation has complete. Review technote 1151039 and technote 1131690 for examples.

Note: A stale restorereplica oplog is when the restorereplica oplog does not get deleted properly at one or more of the healthy replicas. This causes that replica to continually evaluate the oplogs to determine what needs to be sent back to the recovering replica adding considerable time to create a packet even when there are no changes to send. The stale restorereplica oplog is usually created when a restorereplica operation is run twice against the same replica but only the first restorereplica oplog gets sent to one or more of the remote replicas.

The restorereplica oplog contains the new incarnation for the recovering replica so if that oplog does not get imported into one or more replicas before it gets deleted the incarnation for those replicas will be incorrect.

To minimize the chances of running into either of these problems all replicas in the family should be updated with the restorereplica packet either directly or indirectly via an intermediary replica before any update packets are sent back to the recovering replica.

Example:
A recovering replica sends a restore oplog to the hub, then the hub sends a synchronization packet to all other replicas in the family, then the hub sends a the recovery packet to the recovering replica.
This same procedure should be followed once the restoration procedure has been completed.
Note: The packets do not have to be imported immediately at the other replicas but they should be imported before they receive any packets directly from the recovered replica.

If a restore is done using any of these options and there is any doubt as to whether or not the remote replicas were updated, run chreplica -incarnation <incarnation date> immediately after completing the restore operation on the recovering replica. This will notify the remote replicas of the new incarnation. This synchronization may be done directly or indirectly to the remote replicas, as the chreplica oplog will be permanent.

Example
At the recovered replica run:

cleartool dump replica:<recovered replica>

This will return in part the current incarnation date of that replica.

incarnation=06-Apr-07.15:00:15UTC

Copy the current incarnation date into the chreplica -incarnation command:

multitool chreplica -incarnation 06-Apr-07.15:00:15UTC recovering-replica

Note: The incarnation must be set to the same date and time or later than the current incarnation. If it set to an earlier date and time other replica in the family will be unable to import from this replica.

If a stale restorereplica oplog is suspected due to consistently long export synchronization, contact IBM Rational support to obtain a customized tool to find and remove the offending oplog.



The complete directions for restoring a replica are documented in the IBM Rational ClearCase MultiSite Administrator's Guide > Troubleshooting > Troubleshooting MultiSite Operations > Restoring and Replacing VOB Replicas.

Refer to technote 1227020 for special considerations for restoring replicated VOBs.

[{"Product":{"code":"SSSH27","label":"Rational ClearCase"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Component":"Backup and Restore","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF015","label":"IRIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"7.0;7.0.0.1;7.0.0.2;7.0.0.3;7.0.0.4;7.0.0.5;7.0.0.6;7.0.0.7;7.0.0.8;7.0.1;7.0.1.1;7.0.1.2;7.0.1.3;7.0.1.4;7.0.1.5;7.0.1.6;7.0.1.7;7.1;7.1.0.1;7.1.0.2","Edition":"","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
16 June 2018

UID

swg21131381