IBM Support

PI52370: "MULTITOOL: ERROR: GAP IN OPLOG DETECTED FOR REPLICA ..." AFTER REFORMATVOB

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Shortly after doing a reformatvob, "multitool syncreplica
    -export" starts to fail with messages
        multitool: Error: Gap in oplog detected for replica
    "my_local_replica".
        Wanted oplog ID: 12346.  Got oplog ID: 12348.
    OR
        multitool: Error: Can not find oplog from replica
    "my_local_replica" with id 12345
        multitool: Error: First oplog entry from replica
    "my_local_replica" is 12348
        multitool: Error: Gap in oplog entries may indicate missing
    oplog entries.
    
    However, the command
        multitool dumpoplog -invob \myvob -vreplica my_local_replica
    -short -from 12341 -to 12349
    shows that the oplogs leading up to the gap are present
    195187: op=checkout rep=my_local_replica id=12341
    195188: op=checkout rep=my_local_replica id=12342
    195189: op=checkin rep=my_local_replica id=12343
    195190: op=checkin rep=my_local_replica id=12344
    195191: op=checkin rep=my_local_replica id=12345
    195194: op=checkout rep=my_local_replica id=12348
    195195: op=checkout rep=my_local_replica id=12349
    
    Dropping the "-vreplica" switch from the command and using a
    range based on the oplog entry order number
    reveals the missing oplog entries, apparently without any
    replica id
        multitool dumpoplog -invob \myvob -short -from 195187 -to
    195195
    195187: op=checkout rep=my_local_replica id=12341
    195188: op=checkout rep=my_local_replica id=12342
    195189: op=checkin rep=my_local_replica id=12343
    195190: op=checkin rep=my_local_replica id=12344
    195191: op=checkin rep=my_local_replica id=12345
    195192: op=checkout rep= id=12346
    195193: op=checkout rep= id=12347
    195194: op=checkout rep=my_local_replica id=12348
    195195: op=checkout rep=my_local_replica id=12349
    
    The discerning factor about which set of error messages is shown
    is,
    If the target replica still needs to receive oplog id 12345
    (that is, at least one oplog entry from PRIOR to the gap)
       then the first set of messages is shown.
    If the target replica already has oplog id 12345 and the packet
    content should start with oplog id 12346
       then the second set of messages is shown.
    
    Either way, the dumpoplog commands show that the problem is that
    the oplog entries have a corrupt replica ID,
      and it is NOT the case that the missing oplog entries have
    been scrubbed from the database by vob_scrubber.
    
    It turns out that the corrupt oplog entries have been stored in
    the VOB database (post-reformatvob)
      with the DBID that replica:my_local_replica   HAD in the OLD
    VOB database (pre-reformatvob).
    A DBID has been cached and used to create the oplog entry, even
    though the reformatvob completed some 24 hours
     before the time stamp of the corrupt oplog entries.
    
    Steps to reproduce:
    This is a very rare problem and the specific conditions for
    reproducing it are not fully understood at this time.
    

Local fix

  • Contact "IBM Client Success" for assistance to repair the oplog
    entries in the VOB database.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * Users of ClearCase with replicated VOBs.                     *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * Shortly after doing a ClearCase reformatvob, "multitool      *
    * syncreplica -export" starts to fail with messages due to     *
    * creation of bad oplogs. This is happening because            *
    * reformatvob changes the db id as well as db gen num.         *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    

Problem conclusion

  • A fix is available in ClearCase versions 9.0.1.13  9.0.2.5   and
    9.1.0.2
    

Temporary fix

Comments

APAR Information

  • APAR number

    PI52370

  • Reported component name

    CC&CC MSITE WIN

  • Reported component ID

    5724G3300

  • Reported release

    801

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2015-11-12

  • Closed date

    2022-01-02

  • Last modified date

    2022-01-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    CC&CC MSITE WIN

  • Fixed component ID

    5724G3300

Applicable component levels

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSSH3S","label":"Rational ClearCase MultiSite"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"801","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
21 July 2022