IBM Support

IT19169: MQ APPLIANCE HA QUEUE MANAGERS NOT TOLERANT OF NETWORK PING FAILURES.

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • A failover occurs from primary to standby queue managers in a
    HA/DR environment.  This is soon followed by failover back to
    the primary queue manager.
    
    The IBM Appliance system log contains entries such as this:
    
    Dec 16 16:50:51 (none) pengine[24787]:  warning: unpack_rsc_op:
    Processing failed op monitor for QMGR on x.y.a.b: not running
    (7)
    Dec 16 16:50:51 (none) pengine[24787]:    error: color_instance:
    Pre-allocation failed: got x.y.a.b instead of x.y.a.b
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Demote  QMGR_drbd:0#011(Master -> Slave x.y.a.b)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Promote QMGR_drbd:1#011(Slave -> Master x.y.a.b)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Stop    QMGR_fs#011(x.y.a.b)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Stop    QMGR#011(x.y.a.b)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Move    QMGR_DR_IP#011(Started x.y.a.b -> x.y.a.b)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Demote  QMGR_DR_drbd:0#011(Master -> Slave x.y.a.b - blocked)
    Dec 16 16:50:51 (none) pengine[24787]:   notice: LogActions:
    Move    QMGR_DR_drbd:0#011(Slave x.y.a.b -> x.y.a.b)
    

Local fix

  • It may be possible to work around or decrease the likelihood of
    this problem occurring by fixing any network issues that may be
    affecting connectivity between IBM MQ Appliances
    

Problem summary

  • ****************************************************************
    USERS AFFECTED:
    Users of the IBM MQ Appliance who have configured a combination
    of HA and DR and who have unreliable network connectivity may be
    affected by this problem.
    
    
    Platforms affected:
    MultiPlatform
    
    ****************************************************************
    PROBLEM DESCRIPTION:
    A transient DR ping failure could have resulted in a queue
    manager briefly starting on the HA secondary appliance before
    switching back to the HA primary appliance.  If a network is
    particularly unreliable then this failover behaviour may have
    happened frequently.
    

Problem conclusion

  • The code that detects whether the DR secondary IBM MQ Appliance
    can be contacted was modified so that it is more tolerant of
    transient network failures.
    
    ---------------------------------------------------------------
    The fix is targeted for delivery in the following PTFs:
    
    Version    Maintenance Level
    v8.0       8.0.0.7
    v9.0 CD    9.0.2
    
    The latest available maintenance can be obtained from
    'WebSphere MQ Recommended Fixes'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037
    
    If the maintenance level is not yet available information on
    its planned availability can be found in 'WebSphere MQ
    Planned Maintenance Release Dates'
    http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309
    ---------------------------------------------------------------
    

Temporary fix

Comments

APAR Information

  • APAR number

    IT19169

  • Reported component name

    IBM MQ APPL M20

  • Reported component ID

    5725S1400

  • Reported release

    800

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2017-02-08

  • Closed date

    2017-02-28

  • Last modified date

    2017-06-01

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    IBM MQ APPL M20

  • Fixed component ID

    5725S1400

Applicable component levels

  • R800 PSY

       UP

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS5K6E","label":"IBM MQ Appliance"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"8.0","Edition":"","Line of Business":{"code":"LOB36","label":"IBM Automation"}}]

Document Information

Modified date:
01 June 2017