A fix is available
APAR status
Closed as program error.
Error description
During recovery for a Queue Sharing Group ( QSG ), for example in a Disaster Recovery ( DR ) scenario, queue managers experience abends and prolonged connection to a Shared Message Data Set ( SMDS ). This problem occurs especially when there is a large number of messages and is worse when the CF structure needs to use emergency Storage Class Memory ( SCM ). SMF 74 data indicates higher SCM usage than usual. Slow recovery can sometimes be due to BACKUP CFSTRUCT not being done frequently enough. In the reported instance, a backup was done less than an hour before recovery was needed, so there was another reason for the long-running recovery. The delay is related to code added by PH46037. A recovery task is reading each of the entries in the CF structure to determine whether there are any messages in the structure that were put by the local queue manager. The task is doing this checking because the SMDS for the queue manager says that there are no messages. PH46037 added extra verification code in this case, to check the CF structure for entries owned by the local queue manager when the SMDS is marked as empty. This logic was added to detect a problem where the SMDS and CF structure fall out of sync. This logic is also driven for disconnection from an SMDS. Symptoms across queue managers in the QSG might include: IXL041E CONNECTOR NAME: <name>, JOBNAME: <jobame>, ASID: <asid> HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT FOR SUBJECT CONNECTION: <name>. DISCONNECT/FAILURE PROCESSING FOR STRUCTURE <structure name> CANNOT CONTINUE. MONITORING FOR RESPONSE STARTED: <date time> DUMP TITLE=ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=C SQExxxxxxxxxx,JOBNAME=ssidMSTR DUMP TITLE=ssid,ABN=026-08110102,U=SYSOPR ,C=MQ900.930.CFM -CS QEDSS4,M=CSQGFRCV,PSW=070C4000818EBD46,ASID=<asid> DUMP TITLE=ssid,ABN=5C6-00C510AD,U=xxxxx ,C=MQ900.930.CFM -CS QEOCRQ,M=CSQGFRCV,LOC=CSQELPLM.CSQEOCRQ+00000C48 DUMP TITLE=ssid,ABN=5C6-00C5110F,S=0000080A,C=MQ900.930.CFM -CS QEDB2R,M=CSQGFRCV,LOC=CSQELPLM.CSQEDB2R+00000BF4 DUMP TITLE=QP07,ABN=5C6-00C51045,U=SYSOPR ,C=MQ900.930.CFM -CS QESTE ,M=CSQGFRCV,LOC=CSQELPLM.CSQESTE +000025CC DUMP TITLE=QP07,ABN=5C6-00C51063,U=SYSOPR ,C=MQ900.930.CFM -CS QEOPEN,M=CSQGFRCV,LOC=CSQELPLM.CSQEOPEN+00002942 CSQR031I Reading log forwards, RBA=nnnnnnnnnnnnnnnn where "nnnnnnnnnnnnnnnn" remains the same. 08118001 means XES has recognized that an overdue connector response has caused a hang in a structure-related process. This ABEND is issued to capture diagnostic data related to the overdue response. 08110102 means XES has terminated the task or address space associated with a coupling facility structure connector to resolve a hang in a structure-related process. The various 5C6 abend reason codes are related to unexpected errors, namely the S026 abends. The dumps show that queue managers are not able to reply to the EeplDiscFailConnection event because the recovery structure task for a structure is busy. There is a thread looping in CSQEDSS4 (trace point IXL1DSS4 called from CSQEDSC1) making IXLLSTM REQUEST(READ_MULT) calls that fail with ixlRsnCodeTimeout. A value traced in the IXL1DSS4 exit is the number of entries returned by the call (LaaReadCnt). In the reported case, QMGR trace shows that usually only a single entry is being returned but that this is taking ~10ms. Over the 10 seconds of trace available for the task, it took on average 16ms to read an entry. This delay was related to the need to use the SCM storage. With other queue managers doing similar processing for their SMDS, there is more contention in XCF and with the SCM reads. This APAR is raised to prevent scenarios where the verification process keeps the structure task busy for long periods of time.
Local fix
N/A
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 2 Modification 0, * * Release 3 Modification 0 and * * Release 4 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Connection to an OFFLOAD(SMDS) CF * * structure can take a long time if there * * are lots of messages in the structure * * and the local QMGR's SMDS is empty. * * * * The connection taking a long time can * * result in other structure work being * * delayed. In some cases this could lead * * to follow on S026-08118001, * * S026-08110102, various S5C6 abends and * * the QMGR terminating. * * * * Fix CFCC EC P46603 * * MCL009(D41C Bundle S96) resolves a * * problem which may exacerbate the time * * the connection takes. * **************************************************************** APAR PH46037 added extra verification logic when connecting to an SMDS. If the local QMGR's SMDS is empty, then the structure is scanned for entries put by the local QMGR. In certain situations, this scanning can take a long time. The verification processing is performed by a QMGR structure service task. Requests to QMGR service tasks are performed serially, thus it follows that if a request takes a long time, then other requests could be delayed. If structure event requests are queued to the structure task during the verification processing, then S026 abends and the QMGR terminating could follow.
Problem conclusion
A timeout clause has been added to the verification logic added by APAR PH46037 to prevent QMGR structure tasks from being monopolised by the processing.
Temporary fix
Comments
APAR Information
APAR number
PH65545
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
200
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2025-03-05
Closed date
2025-05-30
Last modified date
2025-07-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UO02614 UO02615 UO03443
Modules/Macros
CSQEDSC1 CSQEDSS4 CSQESTRT CSQIRECP
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
R200 PSY UO03443
UP25/06/11 P F506 ¢
R300 PSY UO02615
UP25/04/10 P F504
R400 PSY UO02614
UP25/04/10 P F504
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"200","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]
Document Information
Modified date:
02 July 2025