PI10257: WMQ Z/OS: IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT. ABENDS026.

A fix is available

APAR status

Closed as program error.

Error description

After a Coupling Facility ( CF ) loss of connectivity, some
queue managers in the Queue Sharing Group ( QSG ) might never
respond to the disconnect/failed connection events.  The
CFCONLOS attribute is set to TOLERATE.

Symptoms in the syslog, joblogs, and dumps may include:
--------------------------------------------------------------
IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED
CONNECTION EVENT

IXL049E HANG RESOLUTION ACTION FOR CONNECTOR ...
SYSTEM WILL TAKE ACTION AT mm/dd/yyyy

IXL040E CONNECTOR HAS NOT RESPONDED TO THE USER SYNC POINT
EVENT. USER SYNC POINT PROCESSING FOR STRUCTURE <structure>
CANNOT CONTINUE

CSQE149I CSQECONN Waiting for other queue managers to
disconnect from structure <structure>

CSQV086E QUEUE MANAGER ABNORMAL TERMINATION REASON=00C510AB
 [where 00C510AB means CSQE_StrFailure_or_LossConn_Encountered]

DUMP TITLE=ssid,ABN=026-08110102,U=SYSOPR  ,C=R3600.710.ASMC-
          CSQVSRX ,M=CSQVSRRX...

DUMP TITLE=ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=
          <connector>,JOBNAME=ssidMSTR

GRS ENQ for:
NAME=MAJOR=SYSZCSQE MINOR=CSQERECOVER CF STRUCTURE TASK
ENQqsgname SCOPE=SYSTEMS

MQ latch waits
- for CFMXL1 ( ETHR_Chain_Latch ) from CSQEALL.  The latch is
owned by a thread named STRTSKnn, where "nn" is an integer.
- for DMCSEGAL from modules including CSQMOPNI or CSQEMTIN
--------------------------------------------------------------

There is a 3-way deadlock that results from processing by 4
threads:

4 specific tasks are involved in this scenario:
 1 - Application task putting to a shared IMS bridge queue
     on CF structure APP1
 2 - Structure task for CF structure APP2
 3 - Structure task for CF structure APP1
 4 - IMS bridge queue task for the shared queue


Task 1 puts a message to an IMS bridge queue and closes the
queue without committing the put.  The CF conn loss occurs,
causing task 2 and task 3 to start structure failure processing
and flag the IVSA's with fLossConn.
Task 4 is triggered due to the loss of connectivity and
terminates, closing the queue in the process. Close processing
detects that it is the last one to have the queue open and
attempts to close it in the CF component. This fails, as there
are uncommitted messages on the queue.
Task 1 goes through abort UR processing (this happens if the
application was cancelled for instance). CSQESYNC is invoked
which will get the CF thread block latch for the application
task 1 and resolve any outstanding work. During the process it
detects that it needs to close a shared queue to free memory in
the queue manager, and a synchronous request is scheduled to
the structure task (task 3).
Task 2 continues processing and gets to the point where it
needs to update all CF thread blocks to indicate the structure
is failed. In order to do this, Task 2 will acquire the
ETHR_Chain_Latch, which succeeds, and then iterate over all the
thread blocks on the chain and obtain the individual thread
block latches. It gets suspended for task 1's thread block, as
the latch is currently held by task 1.
Task 3 continues processing and gets suspended when trying to
acquire the ETHR_Chain_Latch, currently held by Task 2.
Task 1 cannot continue processing as Task 3 is busy with
structure failure processing, thus it will not release the
thread block latch,  causing task 2 and task 3 to be suspended
indefinitely.  The deadlock occurs.

The ETHR_Chain_Latch is used by a lot of processing, causing a
large number of tasks to get suspended, including a structure
task responding to a USYNC event. As the response is
outstanding, the task is eventually abended due to the CF
timeout with S026.


Additional Symptom(s) Search Keyword(s):
08118001 08110102 ABEND026 ABENDS026 026 S026 S0026

Local fix

None unless recycling the queue managers will free up the
deadlock and allow processing to continue

Problem summary

****************************************************************
* USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
*                 Release 1 Modification 0.                    *
****************************************************************
* PROBLEM DESCRIPTION: Structure failure processing and abort  *
*                      UOW processing in the CF manager get    *
*                      into a dead-lock. Structure failure     *
*                      processing for a given structure never  *
*                      completes and large amounts of EBs are  *
*                      waiting for the ETHR_Chain_Latch.       *
*                      Shared queues on the CF structure are   *
*                      unusable.                               *
*                      The queue manager can abend with S026.  *
****************************************************************
* RECOMMENDATION:                                              *
****************************************************************
A dead lock occurs between abort UR and structure failure
processing in the CF manager due to a small timing window, where
CSQISQC1 may try to close a shared queue whilst holding the ETHR
latch. If a structure failure occurs before the call is
completed, but after the latch was obtained, the tasks will wait
for each other to complete processing.
As the structure failure process is holding the ETHR_Chain_Latch
a large amount of EBs will be building up waiting for it to
become available.

Problem conclusion

Abort UR processing in the CF manager was changed to close the
shared queue asynchronously, allowing it to complete even if
a structure failure occurs.
100Y
CSQISQC1
CSQMCSQ1

Temporary fix

```
*********
* HIPER *
*********
```

Comments

APAR Information

APAR number
PI10257
Reported component name
WMQ Z/OS V7
Reported component ID
5655R3600
Reported release
100
Status
CLOSED PER
PE
NoPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2014-01-23
Closed date
2014-02-07
Last modified date
2014-04-02

APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:

UI14915

Modules/Macros

```
CSQISQC1 CSQMCSQ1
```

Fix information

Fixed component name
WMQ Z/OS V7
Fixed component ID
5655R3600

Applicable component levels

R100 PSY UI14915
UP14/03/04 P F403 Ž

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 April 2014

Tips

PI10257: WMQ Z/OS: IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT. ABENDS026.

A fix is available

Subscribe

APAR status

Closed as program error.

Error description

Local fix

Problem summary

Problem conclusion

Temporary fix

Comments

APAR Information

APAR number

Reported component name

Reported component ID

Reported release

Status

PE

HIPER

Special Attention

Submitted date

Closed date

Last modified date

APAR is sysrouted FROM one or more of the following:

APAR is sysrouted TO one or more of the following:

Modules/Macros

Fix information

Fixed component name

Fixed component ID

Applicable component levels

R100 PSY UI14915

Fix is available

Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

Document Information

Share your feedback

Need support?