IBM Support

IT29648: APPLICATIONS MAY HANG DURING TAKEOVER IN AUTOMATED TSA HADR ENVIRONMENT DURING FIX PACK UPGRADE

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • When performing a rolling fixpack upgrade in V11.1 in a
    automated TSA HADR environment, Application/database
    connectivity may hang during a takeover from the downlevel
    primary to the uplevel standby database due to peer window.
    
    Hang duration will usually be equal to the value of
    HADR_PEER_WINDOW.  We should not trigger peer window in this
    rolling update scenario.
    
    Example, from the db2diag.log from uplevel Standby, we could see
    that takeover completed successfully within a minute, but still
    application connection hangs until we see the message "Peer
    window ends. Peer window expired.".
    
    2019-07-01-11.52.10.587133-300 E14508358A508        LEVEL: Event
    PID     : 66584616             TID : 3600           PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    APPHDL  : 0-9                  APPID:
    *LOCAL.db2inst1.190701165210
    AUTHID  : db2inst1              HOSTNAME: StandbyHost
    EDUID   : 3600                 EDUNAME: db2agent (SAMPLE) 0
    FUNCTION: DB2 UDB, base sys utilities,
    sqeDBMgr::StartUsingLocalDatabase, probe:13
    START   : Received TAKEOVER HADR command.
    
    2019-07-01-11.52.17.728295-300 I14535698A520        LEVEL:
    Warning
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrs.0.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrStbyTkHandlePrimaryDone, probe:46590
    MESSAGE : Rolling upgrade: Standby is on old version.  Closing
    connection to
              avoid shipping new log records to standby
    
    2019-07-01-11.52.17.729368-300 E14536785A450        LEVEL: Event
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrs.0.0 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrSetHdrState, probe:10000
    CHANGE  : HADR state set to HDR_P_DISCONN_PEER (was HDR_P_PEER),
    connId=3
    
    2019-07-01-11.52.17.817201-300 I14540828A443        LEVEL: Info
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrStbyTkHandlePrimaryDone, probe:46630
    MESSAGE : Standby has completed takeover (now primary).
    
    
    We could only see these messages until 11.57
    
    2019-07-01-11.57.11.036030-300 I14558583A537        LEVEL: Error
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrSendRedirectMsgToOneAddress, probe:31070
    MESSAGE : ZRC=0xFFFFFFFF=-1
    DATA #1 : <preformatted>
    The HADR primary was not able to form a TCP connection with the
    standby: 10.27.98.91:60044.
    .
    .
    .
    2019-07-01-11.57.18.057736-300 I14559121A430        LEVEL:
    Warning
    PID     : 66584616             TID : 10796          PROC :
    db2sysc 0
    INSTANCE: db2inst1              NODE : 000           DB   :
    SAMPLE
    HOSTNAME: StandbyHost
    EDUID   : 10796                EDUNAME: db2hadrp.0.1 (SAMPLE) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery,
    hdrEduAcceptEvent, probe:20202
    MESSAGE : Peer window ends. Peer window expired.
    

Local fix

  • As a workaround, you can temporarily disable peer window before
    performing rolling update.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * ALL                                                          *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Upgrade to Db2 11.1 Mod 4 Fixpack 5 or higher                *
    ****************************************************************
    

Problem conclusion

  • First fixed in Db2 11.1 Mod 4 Fixpack 5
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT29648

  • Reported component name

    DB2 FOR LUW

  • Reported component ID

    DB2FORLUW

  • Reported release

    B10

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-07-08

  • Closed date

    2020-01-16

  • Last modified date

    2020-01-16

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    DB2 FOR LUW

  • Fixed component ID

    DB2FORLUW

Applicable component levels

  • RB10 PSN

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.1","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 January 2020