APAR status
Closed as program error.
Error description
When an RDQM node boots up, systemd brings DRBD resources up. If any DRBD resources fail to come up, the remaining resources not attempted to be brought up and those are skipped. In the primary node of a DR RDQM environment, for example, the DRBD resources are started at boot by systemd by running the following commands in the following order: drbdadm up qm1 drbdadm primary qm1 drbdadm up qm2 drbdadm primary qm2 drbdadm up qm3 drbdadm primary qm3 ... The /var/log/messages will show the following similar logs: Oct 11 08:21:32 node01 kernel: drbd qm2: State change failed: Need access to UpToDate data Oct 11 08:21:32 node01 kernel: drbd qm2: Failed: role( Secondary -> Primary ) Oct 11 08:21:32 node01 rdqmd: AMQ3817E: Replicated data subsystem call '/usr/sbin/drbdadm primary qm2' Oct 11 08:21:32 node01 rdqmd: failed with return code '17'. Oct 11 08:21:32 node01 rdqmd: qm2: State change failed: (-2) Need access to UpToDate data Oct 11 08:21:32 node01 rdqmd: Command 'drbdsetup primary qm2' terminated with exit code 17 Oct 11 08:21:32 node01 rdqmd: AMQ3747E: Replicated data systemd initialization failed. Oct 11 08:21:32 node01 systemd: rdqm.service: main process exited, code=exited, status=71 The result from the above actions leave QM1 running, but not QM2 and QM3 as the DRBD service for QM2 failed to be brought up resulting in the failure to even try to bring up QM3.
Local fix
Manually bring up the remaining resources that weren't brought up by issuing the "drbdadm up " command.
Problem summary
**************************************************************** USERS AFFECTED: <span style="background-color:rgb(255, 255, 255)">Users running IBM MQ Queue manager in a RDQM high availability setup.</span> Platforms affected: Linux on x86-64 **************************************************************** PROBLEM DESCRIPTION: After a reboot, RDQM systemd service can abandon bringing up other resources if it come across a failure while doing "drbdadm primary" for a prior node.
Problem conclusion
RDQM code has been modified so that even after an initial failure it will still attempt to bring up other resources. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.1 LTS 9.1.0.12 v9.2 LTS 9.2.0.6 v9.x CD 9.2.5 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT38664
Reported component name
IBM MQ BASE MP
Reported component ID
5724H7271
Reported release
910
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-10-11
Closed date
2022-04-28
Last modified date
2022-06-21
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
IBM MQ BASE MP
Fixed component ID
5724H7271
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"910"}]
Document Information
Modified date:
22 June 2022