APAR status
Closed as program error.
Error description
Out of Memory when Session Initiation Protocol (SIP) Servlet Container is stuck on multiple TCP connections during SIP load Use of 100 TCP connections caused the WebSphere Application Server to hang after a long term load running ~ 30 hours on OutOfMemory ( OOM ).
Local fix
n/a
Problem summary
**************************************************************** * USERS AFFECTED: Session Initiation Protocol (SIP) users * * of IBM WebSphere Application Server Feature * * Pack for Communications Enabled * * Applications (CEA) * **************************************************************** * PROBLEM DESCRIPTION: There is a deadlock in the SIP * * container under heavy TCP load. * **************************************************************** * RECOMMENDATION: * **************************************************************** The problem occurs when the server attempts to send a multitude of messages over TCP (or TLS) concurrently. When the container requests to send out a SIP message, it places the outbound message in a queue, and calls one of the worker threads to transmit the packet to the network. If there is no available thread in the pool, the container thread gets blocked, until some worker thread becomes available. In some cases, the thread that initiates the transaction, is a thread that is allocated from the same pool as the worker threads. Under extremely high load, it is possible to come to a point where all worker threads are busy, and they all request to send out a message, concurrently. In this situation, each thread in the pool is waiting for one of the others to become available, introducing a deadlock. The server remains unresponsive even after traffic slows down.
Problem conclusion
The problem is fixed in the SIP container by changing the code that initiates message sending. With this fix, the container first attempts to send the message from the initiating thread, instead of forcing the allocation of a worker thread. Only if the message cannot be delivered immediately, a worker thread is allocated for completing the work later. This reduces the chance of draining the thread pool. Moreover, this eliminates the deadlock, and allows the container to recover as soon as traffic slows back down to normal. The fix for this APAR is currently targeted for inclusion in fix pack 1.0.1.11 for the Feature Pack for Communications Enabled Applications. Please refer to the Recommended Updates page for delivery information: http://www.ibm.com/support/docview.wss?rs=180&uid=swg27004980
Temporary fix
Comments
APAR Information
APAR number
PM40650
Reported component name
CEA FEATUREPACK
Reported component ID
5724J0855
Reported release
700
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt
Submitted date
2011-06-02
Closed date
2011-06-09
Last modified date
2011-06-09
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
CEA FEATUREPACK
Fixed component ID
5724J0855
Applicable component levels
R700 PSY
UP
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SUPPORT","label":"IBM Worldwide Support"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"700","Edition":"","Line of Business":{"code":"","label":""}}]
Document Information
Modified date:
09 February 2022