Solving EC3 timeout abends due to Mbean deadlocks
Chad McDowell 0600017V4T Comment (1) Visits (10983)
With an increasing amount of stack products being available and used with WebSphere Application Server on z/OS, we're starting to see EC3 timeouts of different varieties that are caused by Mbean long waits or deadlocks. An initial review of the dump and reason code may produce what looks like a common timeout, like HTTP or EJB. But closer inspection of an SVC dump or javacore could reveal that the thread that took the timeout was actually waiting on an internal Mbean request.
In general, Mbean requests flow from the servant region to the controller region where they are processed. By default, the controller region only has three internal worker threads to the handle such requests. If all three of the internal worker threads are busy, an Mbean request will wait until a free internal worker thread is available. Mbean request are generally administrative in nature, so in the past Mbean requests stemming from application flow were fairly rare or at least not commonly used heavily outside of the nodeagent and deployment manager. However, with the increased complexity of stack products, administrative requests are becoming much more common within the normal usage of an application server.
A timeout resulting from a low number of internal worker threads is generally either caused by a deadlock (such as where one or more of the Mbeans running on an internal worker thread have spawned a new Mbean request and are waiting on a thread to free up) or due to sheer volume the three threads are not able to process the work in a timely manner. In both cases, the solution is to increase the number of internal worker threads through the priv
To set the variable, you can use the admin console Environment > WebSphere Variables panel:
Once there, set the scope to the appropriate server (or level you're wanting to set the variable for) and select new:
Then set the new variable with name: priv
I generally recommend a value of 10 to clients seeing this problem. This should be more than enough to relieve the problem, and the amount of extra resources used will be negligible. Be sure to save and synchronize once the new variable is set. A server recycle will also be required as well for the new value to take effect. After the server is recycled you can check to make sure the new value is set correctly in the joblog by searching for the inte