A fix is available
APAR status
Closed as program error.
Error description
A couple of queue managers in the Queue Sharing Group (QSG) failed with the following messages: Queue manager CSQ2: ------------------ CSQV086E CSQ2 QUEUE MANAGER ABNORMAL TERMINATION REASON=00E50702 IEA794I SVC DUMP HAS CAPTURED: DUMPID=002 REQUESTED BY JOB (CSQ2MSTR) DUMP TITLE=CSQ2,ABN=5C6-00C5105B,U=SYSOPR ,C=MQ900.900.CFM -CSQEDSS2,M=CSQGFRCV,LOC=CSQELPLM.CSQEDSS2+000018FA CSQEDSS2+18FA is in routine free_to_map, which attempts to free the SMDS block associated with a deleted message. This message is likely to be the one that was being processed and led to the 5C6-00C94522 abend that occurred on another queue manager, which would have queued the message for recovery processing by CSQ2 (the owner of the SMDS containing the blocks for that message). Queue manager CSQ1: ------------------ CSQY291E CSQWDSDM SDUMPX FAILED, RC=00000B08,CSQ1,ABN=5C6-00C5105B, LOC=CSQELPLM.CSQEDSS2+000018FA CSQV086E CSQ1 QUEUE MANAGER ABNORMAL TERMINATION REASON=00E50702 After the queue manager restarted, it failed with: CSQE252I CSQ1 CSQEDSS4 SMDS(CSQ1) CFSTRUCT(APPLICATION1) data set <SMDS name> space map will be rebuilt by scanning the structure IXL016I CONNECTOR CSQEQSG1CSQ102 TO STRUCTURE QSG1CSQ_ADMIN TERMINATING: JOB CSQ1MSTR ASID 0133 REQUESTED DISCONNECT REASON=FAILURE. CSQV086E CSQ1 QUEUE MANAGER ABNORMAL TERMINATION REASON=00C94510 IEA794I SVC DUMP HAS CAPTURED: DUMPID=006 REQUESTED BY JOB (CSQ1MSTR) DUMP TITLE=CSQ1,ABN=5C6-00C5105B,U=SYSOPR ,C=MQ900.900.CFM -CSQEDSS2,M=CSQGFRCV,LOC=CSQELPLM.CSQEDSS2+000015E0 CSQEDSS2+15E0 is in routine map_to_free. To get the CSQ1 queue manager to restart, an MQ RESET command was issued in another queue manager in the QSG: /cpf RESET CFSTRUCT(structure-name) ACTION(FAILED) where "cpf" is the command prefix for the queue manager, and "structure-name" is the name of the application structure listed in the CSQE252I message. Logrec shows that an ABEND5C6-00C94522 abend (CSQI_LARGE_MESSAGE_ERROR) occurred on other queue managers (CSQ3 and CSQ4) in the QSG at the same time. The 5C6-00C94522 dumps were not available in the reported case, so the underlying cause of those abends can not be determined. This APAR is to correct the recovery action to try and avoid terminating the queue manager in this situation.
Local fix
N/A
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 0 Modification 0, * * Release 1 Modification 0, and * * Release 2 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Abend 5C6-00C5105B occurs in CSQEDSS2 * * when a SMDS contains a damaged message, * * and is followed by abnormal queue * * manager termination 6C6. * * * * When the damaged message is found * * during backup cfstruct or get * * processing the abend is preceded by * * CSQI037I and abend 5C6-00C94522 on the * * queue manager discovering the message * * (this can be a different queue manager * * to the owning queue manager that * * terminates). * * * * The 5C6-00C5105B can also occur during * * queue manager startup, causing abnormal * * termination and preventing the queue * * manager from starting. * * * * MQSMDS/K * **************************************************************** When a 'damaged' message (one where the CF and SMDS entries are inconsistent with each other) is identified during get processing, it abends 5C6-00C94522 and attempts to handle the bad message. If the message is persistent the structure will be failed, allowing any persistent messages to be recovered from the logs. If the message is non-persistent it is deleted, however an error in this deletion processing causes the SMDS blocks to be released twice - this is detected by the owning queue manager (i.e. the queue manager where the message was put to the queue), which abends 5C6-00C5105B and terminates. If a queue manager terminates, and any such 'damaged' messages exist on it's SMDS, during startup abend 5C6-00C5105B occurs while rebuilding the space map, leading to startup failing and the queue manager terminating again.
Problem conclusion
CSQIMGES is changed to correctly delete the CF entry only when detecting a damaged non-persistent message and abending 5C6-00C94522, preventing the queue manager that owns the message abending 5C6-00C5105B and terminating abnormally. CSQEDSS4 is changed to retry 5C6-00C5105B abends (although a dump will still be captured) when rebuilding the SMDS spacemap, causing the SMDS to be marked as failed rather than terminating the queue manager. This allows the structure to be failed and recovered to recover any persistent messages from the logs.
Temporary fix
Comments
APAR Information
APAR number
PH33239
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
000
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-01-11
Closed date
2021-06-09
Last modified date
2021-08-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI75778 UI75779 UI75780
Modules/Macros
CSQEDSS4 CSQIMGES
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
R000 PSY UI75780
UP21/07/19 P F107 ¢
R100 PSY UI75779
UP21/07/19 P F107 ¢
R200 PSY UI75778
UP21/07/19 P F107 ¢
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"9.0"}]
Document Information
Modified date:
03 August 2021