APAR status
Closed as program error.
Error description
RDQM configured for both HA and DR is unable to run on any node after frequent connection interruptions even with APAR IT38764 included. When there is a failover initiated, though a stop on the stacked drbd resource is issued, the resource will never be restarted. The DR/HA queue manager's stacked DRBD resource will be stuck in disk less state. In the logs we can find below entries Jul 11 20:58:48 ### rdqmd: Waiting for Diskless stacked resource '###.dr' to be Secondary Jul 11 20:58:54 ### rdqmd: Stopped resource 'ms_drbd_dr_###' Jul 11 20:58:54 ### rdqmd: Set target-role=Master for resource 'ms_drbd_dr_#'
Local fix
Suspend the primary node then resume it. rdqmadm -s rdqmadm -r
Problem summary
**************************************************************** USERS AFFECTED: All MQ users who have configured RDQM for high availability (HA) and disaster recovery (DR). Platforms affected: Linux on x86-64 **************************************************************** PROBLEM DESCRIPTION: During failover when a stop on the stacked resource is issued, MQ will set the target-role to Master this will override the stop of stacked resource which can happen in a small timing window when quorum returns before Pacemaker stops the stacked resource. The stop of stacked resource issued is not synchronous and we are not waiting for resource to stop before changing the target role. APAR IT38764 attempted to fix a similar issue previously, but did not account for this additional timing window.
Problem conclusion
The code has been modified to wait for stop on drbd stacked resource to complete before changing the target role. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.10 v9.3 LTS 9.3.0.5 v9.x CD 9.3.3 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT41625
Reported component name
MQ BASE V9.2
Reported component ID
5724H7281
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2022-07-27
Closed date
2023-02-28
Last modified date
2023-02-28
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ BASE V9.2
Fixed component ID
5724H7281
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
01 March 2023