IBM Support

IC95937: IN AN AIX HADR ENVIRONMENT THE TIVOLI STORAGE MANAGER STANDBY SERVER MIGHT ENCOUNTER "HADR STANDBY FOUND BAD LOG"

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • Tivoli Storage Manager s binaries are installed on two different
    LPARS, there s a shared filesystem which contains files of the
    Tivoli Storage Manager servers disk storage pools.
    HACMP switches the shared filesystem between the two nodes, as
    well as the service address of the Tivoli Storage Manager
    server, DB2 of the both servers are bound together in the HADR
    solution.
    
    HACMP scripts bring up the DB2 services, activate the databases
    on both nodes to connect the standby and primary HADR nodes.
    The command "db2 takeover hadr on db TSMDB1" is performed on the
    active HACMP node, to switch the roles of the HADR nodes.
    
    Afterwards the Tivoli Storage Manager server binaries are
    started on the primary server.
    
    After server startup the HADR s standby node is being
    automatically deactivated with the following message reported in
    the db2diag.log:
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduS,
    probe:21210
    RETCODE : ZRC=0x87800148=-2021654200=HDR_ZRC_BAD_LOG
              "HADR standby found bad log"
    
    The symptom has been identified as a match for DB2 APAR IC87721,
    the APAR documents the following messages reported in the
    db2diag log file:
    
    Redo phase after forced takover, may put log file at unexpected
    log chain. This can cause problem when old primary requests to
    rejoin HADR pair as a standby as it does not find log file at
    expected log chain.
    
    Here are messages dumped in new primary indication change in log
    chain for archiving logs.
    
    2012-09-27-18.11.12.048410+540 E7148849A462       LEVEL: Info
    PID     : 4653426              TID  : 4885        PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 4885                 EDUNAME: db2logmgr (inst) 0
    FUNCTION: DB2 UDB, data protection services,
    sqlpgArchiveLogFile, probe:3180
    DATA #1 : <preformatted>
    Completed archive for log file S0010494.LOG to
    /db2/inst/log_archive/db2inst/inst/NODE0000/C0000039/ from
    /db2/inst/log_dir/NODE0000/.
    
    2012-09-27-18.11.17.321304+540 E7149687A462       LEVEL: Info
    PID     : 4653426              TID  : 4885        PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 4885                 EDUNAME: db2logmgr (inst) 0
    FUNCTION: DB2 UDB, data protection services,
    sqlpgArchiveLogFile, probe:3180
    DATA #1 : <preformatted>
    Completed archive for log file S0010495.LOG to
    /db2/inst/log_archive/db2inst/inst/NODE0000/C4967295/ from
    /db2/inst/log_dir/NODE0000/.
    
    Now, when old primary tries to rejoin HADR, it fails to do so,
    as it can not find log file S0010495.log at expected location.
    
    2012-09-27-18.13.08.809540+540 I7156576A367       LEVEL: Warning
    PID     : 4653426              TID  : 5656        PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 5656                 EDUNAME: db2lfr (inst) 0
    FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog,
    probe:5120
    MESSAGE : Return code for LFR opening file S0010495.LOG was
    -2146434659
    
    2012-09-27-18.13.08.809943+540 I7156944A481       LEVEL: Error
    PID     : 4653426              TID  : 16193       PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 16193                EDUNAME: db2hadrp (inst) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
    probe:20590
    MESSAGE : HADR primary database failed to read log pages for
    remote catchup.
              sqlplfrScanNext returned rc = 0x860f000a, scanPages =
    0, scanFlagsOut
              = 0x2
    
    2012-09-27-18.14.52.088747+540 I7161882A367       LEVEL: Warning
    PID     : 4653426              TID  : 5656        PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 5656                 EDUNAME: db2lfr (inst) 0
    FUNCTION: DB2 UDB, recovery manager, sqlplfrFMOpenLog,
    probe:5120
    MESSAGE : Return code for LFR opening file S0010495.LOG was
    -2146434659
    
    2012-09-27-18.14.52.088997+540 I7162250A481       LEVEL: Error
    PID     : 4653426              TID  : 16193       PROC : db2sysc
    0
    INSTANCE: db2inst               NODE : 000
    EDUID   : 16193                EDUNAME: db2hadrp (inst) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduP,
    probe:20590
    MESSAGE : HADR primary database failed to read log pages for
    remote catchup.
              sqlplfrScanNext returned rc = 0x860f000a, scanPages =
    0, scanFlagsOut
              = 0x2
      Tivoli Storage Manager Versions Affected: V6.1 V6.2 V6.3
      Initial Impact: Medium
      Additional Keywords: TSM zz61 zz62 zz63 IC87721
    

Local fix

  • Manually copy the missing log file at expected location
    or create a DB backup on the primary machine and restore it on
    the secondary machine.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All Tivoli Storage Manager server users.     *
    ****************************************************************
    * PROBLEM DESCRIPTION: See error description.                  *
    ****************************************************************
    * RECOMMENDATION: Apply fixing level when available. This      *
    *                 problem is currently projected to be fixed   *
    *                 in levels 6.2.6 and 6.3.5. Note that this    *
    *                 is subject to change at the discretion of    *
    *                 IBM.                                         *
    ****************************************************************
    *
    

Problem conclusion

  • This problem was fixed.
    Affected platforms:  AIX, HP-UX, Solaris, Linux, and Windows.
    

Temporary fix

Comments

APAR Information

  • APAR number

    IC95937

  • Reported component name

    TSM SERVER

  • Reported component ID

    5698ISMSV

  • Reported release

    62A

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt

  • Submitted date

    2013-09-13

  • Closed date

    2013-10-29

  • Last modified date

    2013-10-29

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    TSM SERVER

  • Fixed component ID

    5698ISMSV

Applicable component levels

  • R62A PSY

       UP

  • R62H PSY

       UP

  • R62L PSY

       UP

  • R62S PSY

       UP

  • R62W PSY

       UP

  • R63A PSY

       UP

  • R63H PSY

       UP

  • R63L PSY

       UP

  • R63S PSY

       UP

  • R63W PSY

       UP

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSGSG7","label":"Tivoli Storage Manager"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"62A","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
29 October 2013