A fix is available
APAR status
Closed as program error.
Error description
After a Coupling Facility ( CF ) loss of connectivity, some queue managers in the Queue Sharing Group ( QSG ) might never respond to the disconnect/failed connection events. The CFCONLOS attribute is set to TOLERATE. Symptoms in the syslog, joblogs, and dumps may include: -------------------------------------------------------------- IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT IXL049E HANG RESOLUTION ACTION FOR CONNECTOR ... SYSTEM WILL TAKE ACTION AT mm/dd/yyyy IXL040E CONNECTOR HAS NOT RESPONDED TO THE USER SYNC POINT EVENT. USER SYNC POINT PROCESSING FOR STRUCTURE <structure> CANNOT CONTINUE CSQE149I CSQECONN Waiting for other queue managers to disconnect from structure <structure> CSQV086E QUEUE MANAGER ABNORMAL TERMINATION REASON=00C510AB [where 00C510AB means CSQE_StrFailure_or_LossConn_Encountered] DUMP TITLE=ssid,ABN=026-08110102,U=SYSOPR ,C=R3600.710.ASMC- CSQVSRX ,M=CSQVSRRX... DUMP TITLE=ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME= <connector>,JOBNAME=ssidMSTR GRS ENQ for: NAME=MAJOR=SYSZCSQE MINOR=CSQERECOVER CF STRUCTURE TASK ENQqsgname SCOPE=SYSTEMS MQ latch waits - for CFMXL1 ( ETHR_Chain_Latch ) from CSQEALL. The latch is owned by a thread named STRTSKnn, where "nn" is an integer. - for DMCSEGAL from modules including CSQMOPNI or CSQEMTIN -------------------------------------------------------------- There is a 3-way deadlock that results from processing by 4 threads: 4 specific tasks are involved in this scenario: 1 - Application task putting to a shared IMS bridge queue on CF structure APP1 2 - Structure task for CF structure APP2 3 - Structure task for CF structure APP1 4 - IMS bridge queue task for the shared queue Task 1 puts a message to an IMS bridge queue and closes the queue without committing the put. The CF conn loss occurs, causing task 2 and task 3 to start structure failure processing and flag the IVSA's with fLossConn. Task 4 is triggered due to the loss of connectivity and terminates, closing the queue in the process. Close processing detects that it is the last one to have the queue open and attempts to close it in the CF component. This fails, as there are uncommitted messages on the queue. Task 1 goes through abort UR processing (this happens if the application was cancelled for instance). CSQESYNC is invoked which will get the CF thread block latch for the application task 1 and resolve any outstanding work. During the process it detects that it needs to close a shared queue to free memory in the queue manager, and a synchronous request is scheduled to the structure task (task 3). Task 2 continues processing and gets to the point where it needs to update all CF thread blocks to indicate the structure is failed. In order to do this, Task 2 will acquire the ETHR_Chain_Latch, which succeeds, and then iterate over all the thread blocks on the chain and obtain the individual thread block latches. It gets suspended for task 1's thread block, as the latch is currently held by task 1. Task 3 continues processing and gets suspended when trying to acquire the ETHR_Chain_Latch, currently held by Task 2. Task 1 cannot continue processing as Task 3 is busy with structure failure processing, thus it will not release the thread block latch, causing task 2 and task 3 to be suspended indefinitely. The deadlock occurs. The ETHR_Chain_Latch is used by a lot of processing, causing a large number of tasks to get suspended, including a structure task responding to a USYNC event. As the response is outstanding, the task is eventually abended due to the CF timeout with S026. Additional Symptom(s) Search Keyword(s): 08118001 08110102 ABEND026 ABENDS026 026 S026 S0026
Local fix
None unless recycling the queue managers will free up the deadlock and allow processing to continue
Problem summary
**************************************************************** * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 * * Release 1 Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: Structure failure processing and abort * * UOW processing in the CF manager get * * into a dead-lock. Structure failure * * processing for a given structure never * * completes and large amounts of EBs are * * waiting for the ETHR_Chain_Latch. * * Shared queues on the CF structure are * * unusable. * * The queue manager can abend with S026. * **************************************************************** * RECOMMENDATION: * **************************************************************** A dead lock occurs between abort UR and structure failure processing in the CF manager due to a small timing window, where CSQISQC1 may try to close a shared queue whilst holding the ETHR latch. If a structure failure occurs before the call is completed, but after the latch was obtained, the tasks will wait for each other to complete processing. As the structure failure process is holding the ETHR_Chain_Latch a large amount of EBs will be building up waiting for it to become available.
Problem conclusion
Abort UR processing in the CF manager was changed to close the shared queue asynchronously, allowing it to complete even if a structure failure occurs. 100Y CSQISQC1 CSQMCSQ1
Temporary fix
********* * HIPER * *********
Comments
APAR Information
APAR number
PI10257
Reported component name
WMQ Z/OS V7
Reported component ID
5655R3600
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2014-01-23
Closed date
2014-02-07
Last modified date
2014-04-02
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI14915
Modules/Macros
CSQISQC1 CSQMCSQ1
Fix information
Fixed component name
WMQ Z/OS V7
Fixed component ID
5655R3600
Applicable component levels
R100 PSY UI14915
UP14/03/04 P F403
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
02 April 2014