APAR status
Closed as program error.
Error description
After FixPack 9.2.0.10 was applied to an RDQM node, it was not possible to switch the queue manager back to that node. The "rdqmadm -r" command appeared to run successfully, but the queue manager would not start on the 9.2.0.10 node when running "rdqmadm -p" Messages from /var/log/messages indicate conflicting options in /etc/drbd.d/global_common.conf e.g.: = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Apr 6 13:43:55 rdqm-test-1 drbd(p_drbd_myrdqm)[16344]: ERROR: myrdqm: Command stderr: drbd.d/global_common.conf:10: conflicting use of resource options section 'common:res_options' ...#012drbd.d/global_common.conf:7: resource options section 'common:res_options' first used here. Apr 6 13:43:55 rdqm-test-1 lrmd[7438]: notice: p_drbd_myrdqm_stop_0:16344:stderr [ ocf-exit-reason:DRBD resource myrdqm not found in configuration file /etc/drbd.conf. ] = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Similar errors can be found in the output from "journalctl -xe" = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = -- Unit drbd.service has begun starting up. Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: Starting DRBD resources: Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: drbd.d/global_common.conf:10: conflicting use of resource options section 'common:res_opt Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: drbd.d/global_common.conf:7: resource options section 'common:res_options' first used her Apr 11 09:23:17 rdqm-test-1.ibm.com drbd[5900]: . Apr 11 09:23:17 rdqm-test-1.ibm.com systemd[1]: drbd.service: main process exited, code=exited, status=6/NOTCONFIGURED Apr 11 09:23:17 rdqm-test-1.ibm.com systemd[1]: Failed to start DRBD -- please disable. Unless you are NOT using a cluster manager.. -- Subject: Unit drbd.service has failed = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = ...and crm status reports failed resource actions: = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Stack: corosync Current DC: rdqm-test-2.ibm.com (version 1.1.24.linbit-2.0.el7-8f22be2ae) - partition with quorum Last updated: Tue Apr 11 09:54:20 2023 Last change: Tue Apr 11 09:24:56 2023 by root via crm_attribute on rdqm-test-1.ibm.com 3 nodes configured 6 resource instances configured Online: [ rdqm-test-1.ibm.com rdqm-test-2.ibm.com rdqm-test-3.ibm.com ] Full list of resources: myrdqm (ocf::ibm:rdqm): Started rdqm-test-2.ibm.com p_fs_myrdqm (ocf::heartbeat:Filesystem): Started rdqm-test-2.ibm.com p_rdqmx_myrdqm (ocf::ibm:rdqmx): Started rdqm-test-2.ibm.com Master/Slave Set: ms_drbd_myrdqm [p_drbd_myrdqm] Masters: [ rdqm-test-2.ibm.com ] Slaves: [ rdqm-test-3.ibm.com ] Stopped: [ rdqm-test-1.ibm.com ] Failed Resource Actions: * p_drbd_myrdqm_start_0 on rdqm-test-1.ibm.com 'not installed' (5): call=19, status=complete, exitreason='DRBD resource myrdqm not found in configuration file /etc/drbd.conf.', last-rc-change='Fri Apr 7 18:14:26 2023', queued=0ms, exec=176ms = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Local fix
On each node where the failure has been found, remove the following duplicate lines from /etc/drbd.d/global_common.conf: options { twopc-timeout 100; } ping-timeout 40; socket-check-timeout 5; socket-check-timeout 5; The simplest way to do this is to replace /etc/drbd.d/global_common.conf with the file in the IBM MQ samples directory, e.g.: cp [MQ_INSTALLATION_PATH]/samp/rdqm/etc/drbd.d/global_common.conf /etc/drbd.d Then cleanup failed resource actions. To check for failed resource actions, run : crm status If any are found, run: crm resource cleanup ...on the appropriate resource(s) (for example the queue manager name in lower case). When all failed resource actions have been cleared, use "rdqmadm -r" and "rdqmadm -p" to resume and switch the RDQM to the preferred node.
Problem summary
**************************************************************** USERS AFFECTED: Those using RDQM who applied FixPack 9.2.0.10 Platforms affected: Linux on x86-64 **************************************************************** PROBLEM DESCRIPTION: An error in the 9.2.0.10 FixPack migration code caused duplicate lines to be added to file /etc/drbd.d/global_common.conf which prevented RDQM from restarting on the node.
Problem conclusion
The error has been fixed. --------------------------------------------------------------- The fix is targeted for delivery in the following PTFs: Version Maintenance Level v9.2 LTS 9.2.0.11 v9.3 LTS 9.3.0.10 v9.x CD 9.3.3 The latest available maintenance can be obtained from 'WebSphere MQ Recommended Fixes' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006037 If the maintenance level is not yet available information on its planned availability can be found in 'WebSphere MQ Planned Maintenance Release Dates' http://www-1.ibm.com/support/docview.wss?rs=171&uid=swg27006309 ---------------------------------------------------------------
Temporary fix
Comments
APAR Information
APAR number
IT43517
Reported component name
MQ BASE V9.2
Reported component ID
5724H7281
Reported release
920
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-04-06
Closed date
2023-04-18
Last modified date
2023-04-25
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
MQ BASE V9.2
Fixed component ID
5724H7281
Applicable component levels
[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"920","Line of Business":{"code":"LOB45","label":"Automation"}}]
Document Information
Modified date:
25 April 2023