APAR status
Closed as program error.
Error description
The control region doesn't start the shut down of the servant regions because it has requests for current work associated with deadlocked threads in the control region: deadlock is for: java/lang/Object com/ibm/ws390/xmem/proxy/channel/XMemProxyCRInboundConnLink threads are in: com/ibm/ws/tcp/channel/impl/ZAioTCPConnLink.destroyCommon (Exception) source: ZAioTCPConnLink.java:1072 com/ibm/ws390/xmem/proxy/channel/XMemProxyCRInboundReadCallback. complete(com.ibm.wsspi.channel.framework.VirtualConnection) source: XMemProxyCRInboundReadCallback.java:70 A timing window causes this deadlock. The ReadComplete came into the CR at the exact time that the CR was told to drive the "sendFinalResponse". The two locks (XMem ConnLink lock and the related ZAioTCPConnLink readLock) are obtained in opposite order in the two paths and thus, the possibility for deadlock.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: All users of IBM WebSphere Application * * Server * * V9.0 * **************************************************************** * PROBLEM DESCRIPTION: WebSphere Application Server for z/OS * * hang during stop processing. * **************************************************************** * RECOMMENDATION: * **************************************************************** A Controller would not complete the stop processing because there was a deadlock between two ACRW threads. There are two locks involved in the code paths between the two ACRW threads and they are obtained in opposite order which can lead to a deadlock. One thread was driving the XMemProxyCRInboundConnLink. sendFinalResponse() method. This method gets the XMemProxyCRInboundConnLink Lock then may drive close() holding this lock. Under close(...) is where the HTTP channel may decide to "destroy" the Connection as opposed to hanging out another read (for a persistent Connection). Under the "destroy" path the ZAioTCPConnLink.destroyCommon(..) method is invoked and attempts to obtain its readLock. This obtain of the readLock could be suspended if a readComplete ACRW path is running concurrently and had already obtained the readLock (in ZAioTCPChannel.readCompleted()). The deadlock would occur when the readComplete processing calls XMemProxyCRInboundReadCallback.complete() which attempts to the obtain the XMemProxyCRInboundConnLink lock. The readComplete is already holding the ZAioTCPConnLink readLock at this point. The following is what the top of the stack would look like for the ACRW thread processing the sendFinalResponse method: ZAioTCPConnLink.destroyCommon(Exception) source: ZAioTCPConnLink.java:1072 ZAioTCPConnLink.destroy(Exception) source: ZAioTCPConnLink.java:1050 OutboundConnectorLink.close(com.ibm.wsspi.channel.framework. VirtualConnection, Exception) source: OutboundConnectorLink.java:50 HttpInboundLink.close(com.ibm.wsspi.channel.framework. VirtualConnection, Exception) source: HttpInboundLink.java:899 InboundApplicationLink.close(com.ibm.wsspi.channel.framework. VirtualConnection, Exception) source: InboundApplicationLink.java:58 XMemProxyCRInboundConnLink.close(com.ibm.wsspi.channel. framework.VirtualConnection, Exception) source: XMemProxyCRInboundConnLink.java:3236 XMemProxyCRInboundConnLink.sendFinalResponse(int, com.ibm.ws390.xmem.proxy.XMemProxyCommMetaData, com.ibm.wsspi.buffermgmt.WsByteBuffer, long) source: XMemProxyCRInboundConnLink.java:2850 At this point the sendFinalResponse thread obtained the XMemProxyCRInboundConnLink Lock in sendFinalResponse and is waiting for the ZAioTCPConnLink.readLock in destroyCommon. The following is what the readComplete ACRW thread would look like: XMemProxyCRInboundReadCallback.complete( com.ibm.wsspi.channel.framework.VirtualConnection) source: XMemProxyCRInboundReadCallback.java:70 HttpServiceContextImpl.continueRead() source: HttpServiceContextImpl.java:4636 HttpISCBodyReadCallback.complete( com.ibm.wsspi.channel.framework.VirtualConnection, com.ibm.wsspi.tcp.channel.TCPReadRequestContext) source: HttpISCBodyReadCallback.java:87 ZAioTCPReadRequestContextImpl.readCompleted(long, com.ibm.wsspi.channel.framework.VirtualConnection, com.ibm.wsspi.buffermgmt.WsByteBuffer, String) source: ZAioTCPReadRequestContextImpl.java:683 ZAioTCPConnLink.readCompleted(long, com.ibm.wsspi.buffermgmt.WsByteBuffer, long, String) source: ZAioTCPConnLink.java:1248 ZAioTCPChannel.readCompleted( com.ibm.ws.tcp.channel.impl.ZAioTCPConnLink, long, long, com.ibm.wsspi.buffermgmt.WsByteBuffer, byte[], long, String) source: ZAioTCPChannel.java:934 ZAioTCPChannelCPPUtilities.readCompleted( com.ibm.ws.tcp.channel.impl.ZAioTCPChannel, com.ibm.ws.tcp.channel.impl.ZAioTCPConnLink, long, long, com.ibm.wsspi.buffermgmt.WsByteBuffer, byte[], long, String) source: ZAioTCPChannelCPPUtilities.java:181) At this point the readComplete thread is holding the ZAioTCPConnLink.readLock, obtained in XMemProxyCRInboundReadCallback.complete, and is attempting to obtain the XMemProxyCRInboundConnLink lock in the XMemProxyCRInboundReadCallback.complete() method.
Problem conclusion
Code was modified in the XMemProxyCRInboundConnLink's sendFinalResponse() method to drop its Lock before driving the close(..) call. We do not need to synchronize further between the remaining sendFinalResponse actions and the potential close processing. The fix for this APAR is targeted for inclusion in fix pack 9.0.5.8. For more information, see 'Recommended Updates for WebSphere Application Server': https://www.ibm.com/support/pages/node/715553
Temporary fix
Comments
APAR Information
APAR number
PH34816
Reported component name
WEBSPHERE FOR Z
Reported component ID
5655I3500
Reported release
900
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-02-24
Closed date
2021-03-23
Last modified date
2021-03-23
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WEBSPHERE FOR Z
Fixed component ID
5655I3500
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS7K4U","label":"WebSphere Application Server for z\/OS"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"900"}]
Document Information
Modified date:
24 March 2021