IBM Support

IT29648: APPLICATIONS MAY HANG DURING TAKEOVER IN AUTOMATED TSA HADR ENVIRONMENT DURING FIX PACK UPGRADE

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • When performing a rolling fixpack upgrade in V11.1 in a
    automated TSA HADR environment, Application/database
    connectivity may hang during a takeover from the downlevel
    primary to the uplevel standby database due to peer window.
    
    Hang duration will usually be equal to the value of
    HADR_PEER_WINDOW.  We should not trigger peer window in this
    rolling update scenario.
    
    Example, from the db2diag.log from uplevel Standby, we could see
    that takeover completed successfully within a minute, but still
    application connection hangs until we see the message "Peer
    window ends. Peer window expired.".
    
    2019-07-01-11.52.10.587133-300 E14508358A508        LEVEL: Event
    PID     : 66584616             TID : 3600           PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    APPHDL  : 0-9                  APPID:
    *LOCAL.db2inst1.190701165210
    AUTHID  : db2inst1              HOSTNAME: StandbyHost
    EDUID   : 3600                 EDUNAME: db2agent (SAMPLE) 0
    FUNCTION: DB2 UDB, base sys utilities,
    sqeDBMgr::StartUsingLocalDatabase, probe:13
    START   : Received TAKEOVER HADR command.
    
    2019-07-01-11.52.17.728295-300 I14535698A520        LEVEL:
    Warning
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrs.0.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrStbyTkHandlePrimaryDone, probe:46590
    MESSAGE : Rolling upgrade: Standby is on old version.  Closing
    connection to
              avoid shipping new log records to standby
    
    2019-07-01-11.52.17.729368-300 E14536785A450        LEVEL: Event
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrs.0.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrSetHdrState, probe:10000
    CHANGE  : HADR state set to HDR_P_DISCONN_PEER (was HDR_P_PEER),
    connId=3
    
    2019-07-01-11.52.17.817201-300 I14540828A443        LEVEL: Info
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrStbyTkHandlePrimaryDone, probe:46630
    MESSAGE : Standby has completed takeover (now primary).
    
    
    We could only see these messages until 11.57
    
    2019-07-01-11.57.11.036030-300 I14558583A537        LEVEL: Error
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrSendRedirectMsgToOneAddress, probe:31070
    MESSAGE : ZRC=0xFFFFFFFF=-1
    DATA #1 : <preformatted>
    The HADR primary was not able to form a TCP connection with the
    standby: 10.27.98.91:60044.
    .
    .
    .
    2019-07-01-11.57.18.057736-300 I14559121A430        LEVEL:
    Warning
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrEduAcceptEvent, probe:20202
    MESSAGE : Peer window ends. Peer window expired.
    

Local fix

  • As a workaround, you can temporarily disable peer window before
    performing rolling update.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 5 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 5
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29648

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-07-08

  • Closed date

    2020-01-16

  • Last modified date

    2020-01-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU053","label":"Cloud \u0026 Data Platform"},"Product":{"code":"SSEPGG","label":"DB2 for Linux- UNIX and Windows"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1"}]

Document Information

Modified date:
02 September 2021