A fix is available
APAR status
Closed as program error.
Error description
A standalone dump was created as the system became unresponsive, and TSO and CICS users were unable to log on. SYSLOG stopped recording messages. The z/OS team determined that a TCB ("TCB1") in JOB *MASTER* (ASID 1) was running its end of memory (EOM) resource managers. IPCS ANALYZE RESOURCE showed that the LOCAL LOCK FOR ASID 0001 was held by TCB1 with DATA=SUSPENDED AND NOT DISPATCHABLE. Another TCB ("TCB2") in ASID 1 held the LOCAL LOCK for an MQ MSTR job. This second lock was needed by that MSTR job, its CHIN job, and other jobs including CICS. TCB2 was in a loop per system trace and had not released the lock for some time. This loop was in routine SetEBCncl in module CSQVSRX, which is CSQVCNCL processing. The code will loop until it has managed to set bit EBCancel on, but due to an unexpected state in the EB control block, it looped forever. The inconsistent state in the EB arose from a timing window between suspend/resume processing when the suspending task abends. The scenario leading up to the problem: CSQMCLMT was called for local memory termination for MQ allied (application) address spaces. For CICS address spaces, it disconnects each of the associated EBs and, if they are suspended, it wakes them up with a CSQVCNCL call. In this case, TCB2 was disconnecting an EB for the CICS job. There was a large amount of latch contention for the IVSA.csCursLatch for a queue. This occurred due to many transactions simultaneously doing MQGET by MsgId from the queue, which had a high CURDEPTH. MQ latch control was required while MQ was locating the correct message. The queue was not indexed, and message CSQI004I was issued for the queue. CSQI004I indicates that indexing the queue (setting INDXTYPE) will improve performance. This latch contention resulted in frequent suspending and resuming of TCBs for the transaction. At the same time, storage shortages in the CICS region resulted in many SRB-to-task percolations occurring, resulting in CICS TCBs being abended S878 while running in MQ code. A combination of MQ trace and SYSTRACE shows the order of events for TCB2: 1) The TCB was paused in CSQVSUSP waiting for the IVSA.csCursLatch. 2) Almost immediately after, the latch was released, and the TCB was resumed in CSQVRESM. The resume cleared many key fields in the EB. 3) SRB-to-task percolation occurred, and the TCB was ABTERMed with ABENDS878-10. This drove recovery routine CSQVSRRX for the TCB. 4) The recovery code identified that a resume occurred and set a temporary value in EBSROB. This was to prevent the field being improperly used during the dumping process. 5) Recovery decided to create a dump for the 878-10 abend. 6) The CICS address space was memtermed, preventing further recovery processing for the TCB. This processing resulted in the temporary value being left in EBSROB. Part of the memterm processing for the CICS application address space resulted in each of the EBs for the address space being disconnected with a call to CSQVCNCL in CSQVSRX. This was where the looping occurred due to the temporary value in EBSROB. If not for the memterm, this temporary value would have been cleared by CSQVSRRX after the dump was taken. The CSQVSRX code should not be allowed to loop while holding the local lock for the queue manager. In the reported case, the system was IPL'd to clear the problem.
Local fix
None, other than taking steps to avoid the timing window by preventing the ABEND878 and by altering the INDXTYPE of the queue.
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM MQ for z/OS Version 9 * * Release 1 Modification 0 and Release 2 * * Modification 0. * **************************************************************** * PROBLEM DESCRIPTION: A MEMTERM of an MQ allied address space * * may result in a hang in End of Memory * * (EOM) processing for the address space. * * Examining the *MASTER* TCB responsible * * for the EOM processing shows that it is * * looping in CSECT CSQVSRX, while holding * * the local lock for the QMGR MSTR * * address space. * **************************************************************** If an address space is MEMTERMed, then recovery routines will not get control. MQ recovery routine CSQVSRRX sets a temporary value in an abended task's EBSROB. If the task's home address space is MEMTERMed, then this temporary value may persist. During end of memory processing for an allied address space, active EBs will be disconnected from the QMGR address space. This may require a call to routine CSQVCNCL in CSECT CSQVSRX. This routine cannot handle the temporary EBSROB value, and will loop indefinitely while holding the local lock for the QMGR MSTR address space.
Problem conclusion
CSQVCNCL has been corrected to handle the temporary EBSROB value.
Temporary fix
Comments
APAR Information
APAR number
PH42296
Reported component name
IBM MQ Z/OS V9
Reported component ID
5655MQ900
Reported release
100
Status
CLOSED PER
PE
YesPE
HIPER
YesHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-11-22
Closed date
2022-01-10
Last modified date
2022-03-01
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
UI78860 UI78861
Modules/Macros
CSQ0CACB CSQ0COPN CSQ0DEAD CSQ0DPCS CSQ0DSVC CSQ0ERST CSQ0IPRH CSQ0LEPL CSQ3AAES CSQ3AM00 CSQ3AMFR CSQ3AUCM CSQ3AUCN CSQ3AUFR CSQ3AUGI CSQ3CT30 CSQ3CT80 CSQ3EXT0 CSQ3GCAB CSQ3ID80 CSQ3IDES CSQ3LCHX CSQ3PR00 CSQ3RIA0 CSQ3RIM0 CSQ3RIND CSQ3RRSR CSQ3RRSX CSQ3RRXF CSQ3SSES CSQ3SSFR CSQ9SCN9 CSQAPRHX CSQARIB CSQGEXIT CSQGFFRR CSQGFRCV CSQGGEPL CSQIRECP CSQJB004 CSQJC001 CSQJC003 CSQJC006 CSQJC008 CSQJC09A CSQJCR01 CSQJOFF6 CSQJOFF9 CSQJPOPN CSQJR007 CSQJR06A CSQJRE01 CSQJRE08 CSQJRE26 CSQJW008 CSQJW206 CSQJWE01 CSQMALCH CSQMCALH CSQMCCHT CSQMCDLC CSQMCFEF CSQMCFRQ CSQMCFTK CSQMCFWU CSQMCIDT CSQMCLMT CSQMCMHB CSQMCPRH CSQMCRES CSQMCTXE CSQMCTXS CSQMFMH1 CSQMXARH CSQMXCLN CSQMZLOO CSQRCAFR CSQRCRFR CSQRCRQS CSQRCRSC CSQRCSHT CSQRCURS CSQRIURS CSQRPBCS CSQRPBCW CSQRPECS CSQRPLCS CSQRRRQS CSQRRURS CSQRUA01 CSQRUB01 CSQRUC01 CSQRUE01 CSQSCON CSQSCON2 CSQSDMPS CSQSFACL CSQSFBK CSQSFPL CSQSGMN CSQSHDWN CSQSPOWN CSQSPURS CSQSRSUP CSQSTERM CSQSVPL CSQUZAP CSQV002M CSQVCFRR CSQVCONN CSQVCRTH CSQVCST0 CSQVDISC CSQVDST0 CSQVEOT1 CSQVEUS1 CSQVEUS2 CSQVEUS3 CSQVEUS4 CSQVFACE CSQVFEB CSQVGACE CSQVIALC CSQVLEPL CSQVLFRR CSQVLTT0 CSQVSDC0 CSQVSLK CSQVSLT0 CSQVSRRX CSQVSRX CSQVSUL0 CSQVTFRR CSQVTRTH CSQVUTIL CSQVXLT0 CSQVXUL0 CSQWAAPI CSQWACC6 CSQWACCV CSQWDSD0 CSQWDSDM CSQWDST2 CSQWVFRR CSQWVOPX CSQWVSMT CSQWVSR2 CSQWVZSA CSQWVZSS CSQWVZXT CSQWWFST CSQXDTRM CSQXFSTR CSQXGRIM CSQXJST CSQXSUPR CSQXTCNC CSQXTCTL CSQYALLI CSQYASCP CSQYEAT2 CSQYEATE CSQYEPL0 CSQYESCF CSQYESWE CSQYLGBL CSQYLGUN CSQYMESP CSQYMESS CSQYSIRM CSQYSTRT
Fix information
Fixed component name
IBM MQ Z/OS V9
Fixed component ID
5655MQ900
Applicable component levels
Fix is available
Select the PTF appropriate for your component level. You will be required to sign in. Distribution on physical media is not available in all countries.
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSYHRD","label":"IBM MQ"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"100"}]
Document Information
Modified date:
02 March 2022