IBM Support

PI10257: WMQ Z/OS: IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT. ABENDS026.

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • After a Coupling Facility ( CF ) loss of connectivity, some
    queue managers in the Queue Sharing Group ( QSG ) might never
    respond to the disconnect/failed connection events.  The
    CFCONLOS attribute is set to TOLERATE.
    
    Symptoms in the syslog, joblogs, and dumps may include:
    --------------------------------------------------------------
    IXL041E CONNECTOR HAS NOT RESPONDED TO THE DISCONNECTED/FAILED
    CONNECTION EVENT
    
    IXL049E HANG RESOLUTION ACTION FOR CONNECTOR ...
    SYSTEM WILL TAKE ACTION AT mm/dd/yyyy
    
    IXL040E CONNECTOR HAS NOT RESPONDED TO THE USER SYNC POINT
    EVENT. USER SYNC POINT PROCESSING FOR STRUCTURE <structure>
    CANNOT CONTINUE
    
    CSQE149I CSQECONN Waiting for other queue managers to
    disconnect from structure <structure>
    
    CSQV086E QUEUE MANAGER ABNORMAL TERMINATION REASON=00C510AB
     [where 00C510AB means CSQE_StrFailure_or_LossConn_Encountered]
    
    DUMP TITLE=ssid,ABN=026-08110102,U=SYSOPR  ,C=R3600.710.ASMC-
              CSQVSRX ,M=CSQVSRRX...
    
    DUMP TITLE=ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=
              <connector>,JOBNAME=ssidMSTR
    
    GRS ENQ for:
    NAME=MAJOR=SYSZCSQE MINOR=CSQERECOVER CF STRUCTURE TASK
    ENQqsgname SCOPE=SYSTEMS
    
    MQ latch waits
    - for CFMXL1 ( ETHR_Chain_Latch ) from CSQEALL.  The latch is
    owned by a thread named STRTSKnn, where "nn" is an integer.
    - for DMCSEGAL from modules including CSQMOPNI or CSQEMTIN
    --------------------------------------------------------------
    
    There is a 3-way deadlock that results from processing by 4
    threads:
    
    4 specific tasks are involved in this scenario:
     1 - Application task putting to a shared IMS bridge queue
         on CF structure APP1
     2 - Structure task for CF structure APP2
     3 - Structure task for CF structure APP1
     4 - IMS bridge queue task for the shared queue
    
    
    Task 1 puts a message to an IMS bridge queue and closes the
    queue without committing the put.  The CF conn loss occurs,
    causing task 2 and task 3 to start structure failure processing
    and flag the IVSA's with fLossConn.
    Task 4 is triggered due to the loss of connectivity and
    terminates, closing the queue in the process. Close processing
    detects that it is the last one to have the queue open and
    attempts to close it in the CF component. This fails, as there
    are uncommitted messages on the queue.
    Task 1 goes through abort UR processing (this happens if the
    application was cancelled for instance). CSQESYNC is invoked
    which will get the CF thread block latch for the application
    task 1 and resolve any outstanding work. During the process it
    detects that it needs to close a shared queue to free memory in
    the queue manager, and a synchronous request is scheduled to
    the structure task (task 3).
    Task 2 continues processing and gets to the point where it
    needs to update all CF thread blocks to indicate the structure
    is failed. In order to do this, Task 2 will acquire the
    ETHR_Chain_Latch, which succeeds, and then iterate over all the
    thread blocks on the chain and obtain the individual thread
    block latches. It gets suspended for task 1's thread block, as
    the latch is currently held by task 1.
    Task 3 continues processing and gets suspended when trying to
    acquire the ETHR_Chain_Latch, currently held by Task 2.
    Task 1 cannot continue processing as Task 3 is busy with
    structure failure processing, thus it will not release the
    thread block latch,  causing task 2 and task 3 to be suspended
    indefinitely.  The deadlock occurs.
    
    The ETHR_Chain_Latch is used by a lot of processing, causing a
    large number of tasks to get suspended, including a structure
    task responding to a USYNC event. As the response is
    outstanding, the task is eventually abended due to the CF
    timeout with S026.
    
    
    Additional Symptom(s) Search Keyword(s):
    08118001 08110102 ABEND026 ABENDS026 026 S026 S0026
    

Local fix

  • None unless recycling the queue managers will free up the
    deadlock and allow processing to continue
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of WebSphere MQ for z/OS Version 7 *
    *                 Release 1 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: Structure failure processing and abort  *
    *                      UOW processing in the CF manager get    *
    *                      into a dead-lock. Structure failure     *
    *                      processing for a given structure never  *
    *                      completes and large amounts of EBs are  *
    *                      waiting for the ETHR_Chain_Latch.       *
    *                      Shared queues on the CF structure are   *
    *                      unusable.                               *
    *                      The queue manager can abend with S026.  *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    A dead lock occurs between abort UR and structure failure
    processing in the CF manager due to a small timing window, where
    CSQISQC1 may try to close a shared queue whilst holding the ETHR
    latch. If a structure failure occurs before the call is
    completed, but after the latch was obtained, the tasks will wait
    for each other to complete processing.
    As the structure failure process is holding the ETHR_Chain_Latch
    a large amount of EBs will be building up waiting for it to
    become available.
    

Problem conclusion

  • Abort UR processing in the CF manager was changed to close the
    shared queue asynchronously, allowing it to complete even if
    a structure failure occurs.
    100Y
    CSQISQC1
    CSQMCSQ1
    

Temporary fix

  • *********
    * HIPER *
    *********
    

Comments

APAR Information

  • APAR number

    PI10257

  • Reported component name

    WMQ Z/OS V7

  • Reported component ID

    5655R3600

  • Reported release

    100

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2014-01-23

  • Closed date

    2014-02-07

  • Last modified date

    2014-04-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UI14915

Modules/Macros

  • CSQISQC1 CSQMCSQ1
    

Fix information

  • Fixed component name

    WMQ Z/OS V7

  • Fixed component ID

    5655R3600

Applicable component levels

  • R100 PSY UI14915

       UP14/03/04 P F403 Ž

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"SG19M","label":"APARs - z\/OS environment"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"7.1","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
02 April 2014