IBM Support

PH65545: MQ Z/OS: SMDS CONNECTION IN A QSG CAN ABEND AND TAKE HOURS TO COMPLETE IN CERTAIN CIRCUMSTANCES WITH PH46037

A fix is available

Subscribe

You can track all active APARs for this component.

 

APAR status

  • Closed as program error.

Error description

  • During recovery for a Queue Sharing Group ( QSG ), for example
    in a Disaster Recovery ( DR ) scenario, queue managers
    experience abends and prolonged connection to a Shared Message
    Data Set ( SMDS ). This problem occurs especially when there is
    a large number of messages and is worse when the CF structure
    needs to use emergency Storage Class Memory ( SCM ). SMF 74 data
    indicates higher SCM usage than usual.
    
    Slow recovery can sometimes be due to BACKUP CFSTRUCT not
    being done frequently enough.  In the reported instance, a
    backup was done less than an hour before recovery was needed,
    so there was another reason for the long-running recovery.
    
    The delay is related to code added by PH46037.  A recovery task
    is reading each of the entries in the CF structure to determine
    whether there are any messages in the structure that were put
    by the local queue manager. The task is doing this checking
    because the SMDS for the queue manager says that there are no
    messages.  PH46037 added extra verification code in this case,
    to check the CF structure for entries owned by the local queue
    manager when the SMDS is marked as empty. This logic was added
    to detect a problem where the SMDS and CF structure fall out of
    sync. This logic is also driven for disconnection from an SMDS.
    
    Symptoms across queue managers in the QSG might include:
    
    IXL041E CONNECTOR NAME: <name>, JOBNAME: <jobame>, ASID: <asid>
    HAS NOT RESPONDED TO THE DISCONNECTED/FAILED CONNECTION EVENT
    FOR SUBJECT CONNECTION: <name>.
    DISCONNECT/FAILURE PROCESSING FOR STRUCTURE <structure name>
    CANNOT CONTINUE.
    MONITORING FOR RESPONSE STARTED: <date time>
    
    DUMP TITLE=ABEND=S026,REASON=08118001,CONNECTOR HANG: CONNAME=C
               SQExxxxxxxxxx,JOBNAME=ssidMSTR
    
    DUMP TITLE=ssid,ABN=026-08110102,U=SYSOPR  ,C=MQ900.930.CFM -CS
               QEDSS4,M=CSQGFRCV,PSW=070C4000818EBD46,ASID=<asid>
    
    DUMP TITLE=ssid,ABN=5C6-00C510AD,U=xxxxx   ,C=MQ900.930.CFM -CS
               QEOCRQ,M=CSQGFRCV,LOC=CSQELPLM.CSQEOCRQ+00000C48
    
    DUMP TITLE=ssid,ABN=5C6-00C5110F,S=0000080A,C=MQ900.930.CFM -CS
               QEDB2R,M=CSQGFRCV,LOC=CSQELPLM.CSQEDB2R+00000BF4
    
    DUMP TITLE=QP07,ABN=5C6-00C51045,U=SYSOPR  ,C=MQ900.930.CFM -CS
               QESTE ,M=CSQGFRCV,LOC=CSQELPLM.CSQESTE +000025CC
    
    DUMP TITLE=QP07,ABN=5C6-00C51063,U=SYSOPR  ,C=MQ900.930.CFM -CS
               QEOPEN,M=CSQGFRCV,LOC=CSQELPLM.CSQEOPEN+00002942
    
    CSQR031I Reading log forwards, RBA=nnnnnnnnnnnnnnnn
      where "nnnnnnnnnnnnnnnn" remains the same.
    
    
    08118001 means XES has recognized that an overdue connector
    response has caused a hang in a structure-related process. This
    ABEND is issued to capture diagnostic data related to the
    overdue response.
    
    08110102 means XES has terminated the task or address space
    associated with a coupling facility structure connector to
    resolve a hang in a structure-related process.
    
    The various 5C6 abend reason codes are related to unexpected
    errors, namely the S026 abends.
    
    The dumps show that queue managers are not able to reply to the
    EeplDiscFailConnection event because the recovery structure
    task for a structure is busy.  There is a thread looping in
    CSQEDSS4 (trace point IXL1DSS4 called from CSQEDSC1) making
    IXLLSTM REQUEST(READ_MULT) calls that fail with
    ixlRsnCodeTimeout.
    
    A value traced in the IXL1DSS4 exit is the number of entries
    returned by the call (LaaReadCnt). In the reported case, QMGR
    trace shows that usually only a single entry is being returned
    but that this is taking ~10ms. Over the 10 seconds of trace
    available for the task, it took on average 16ms to read an
    entry.  This delay was related to the need to use the SCM
    storage.
    
    With other queue managers doing similar processing for their
    SMDS, there is more contention in XCF and with the SCM reads.
    
    This APAR is raised to prevent scenarios where the verification
    process keeps the structure task busy for long periods of time.
    

Local fix

  • N/A
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED: All users of IBM MQ for z/OS Version 9       *
    *                 Release 2 Modification 0,                    *
    *                 Release 3 Modification 0 and                 *
    *                 Release 4 Modification 0.                    *
    ****************************************************************
    * PROBLEM DESCRIPTION: Connection to an OFFLOAD(SMDS) CF       *
    *                      structure can take a long time if there *
    *                      are lots of messages in the structure   *
    *                      and the local QMGR's SMDS is empty.     *
    *                                                              *
    *                      The connection taking a long time can   *
    *                      result in other structure work being    *
    *                      delayed. In some cases this could lead  *
    *                      to follow on S026-08118001,             *
    *                      S026-08110102, various S5C6 abends and  *
    *                      the QMGR terminating.                   *
    *                                                              *
    *                      Fix CFCC EC P46603                      *
    *                      MCL009(D41C Bundle S96) resolves a      *
    *                      problem which may exacerbate the time   *
    *                      the connection takes.                   *
    ****************************************************************
    APAR PH46037 added extra verification logic when connecting to
    an SMDS. If the local QMGR's SMDS is empty, then the structure
    is scanned for entries put by the local QMGR. In certain
    situations, this scanning can take a long time.
    
    The verification processing is performed by a QMGR structure
    service task. Requests to QMGR service tasks are performed
    serially, thus it follows that if a request takes a long time,
    then other requests could be delayed.
    
    If structure event requests are queued to the structure task
    during the verification processing, then S026 abends and the
    QMGR terminating could follow.
    

Problem conclusion

  • A timeout clause has been added to the verification logic added
    by APAR PH46037 to prevent QMGR structure tasks from being
    monopolised by the processing.
    

Temporary fix

Comments

APAR Information

  • APAR number

    PH65545

  • Reported component name

    IBM MQ Z/OS V9

  • Reported component ID

    5655MQ900

  • Reported release

    200

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    YesHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2025-03-05

  • Closed date

    2025-05-30

  • Last modified date

    2025-07-02

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    UO02614 UO02615 UO03443

Modules/Macros

  • CSQEDSC1 CSQEDSS4 CSQESTRT CSQIRECP
    

Fix information

  • Fixed component name

    IBM MQ Z/OS V9

  • Fixed component ID

    5655MQ900

Applicable component levels

  • R200 PSY UO03443

       UP25/06/11 P F506 ¢

  • R300 PSY UO02615

       UP25/04/10 P F504

  • R400 PSY UO02614

       UP25/04/10 P F504

Fix is available

  • Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"200","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Document Information

Modified date:
02 July 2025