A StaleConnectionException is an exception that is generated by the WebSphere Application Server database connection code when a JDBC driver returns a fatal error from a connection request or operation.
Configuring a WebSphere Application Server datasource includes specifying a datasource helper class, which is selected depending on your JDBC driver type, for example, DB2UniversalDataStoreHelper for DB2, Oracle11gDataStoreHelper for Oracle 11g, etc. Each helper class contains a list of driver specific error codes associated with fatal exceptions. When a JDBC driver returns one of these codes, the WebSphere Application server will map the return code to a StaleConnectionException, the intent being to provide one generic error that will allow applications to recover database connections rather than have to code application recovery for each applicable database product or individual JDBC return code.
Specifically catching StaleConnectionException is not required in an application. Because applications are already required to catch java.sql.SQLException and StaleConnectionException extends SQLException, StaleConnectionException is automatically caught in the general catch-block. However, specifically catching StaleConnectionException makes it possible for an application to recover from bad connections. The most common time for StaleConnectionException to be thrown is the first time that a connection is used, just after it is retrieved. Because connections are pooled, a database failure is not detected until the operation immediately following its retrieval from the pool, which is the first time communication to the database is attempted. And it is only when a failure is detected that the connection is marked stale. StaleConnectionException occurs less often if each method that accesses the database gets a new connection from the pool. When a StaleConnectionException is caught from a connection in auto-commit mode, recovery is a simple matter of closing all of the associated JDBC resources and retrying the operation with a new connection.
In most cases StaleConnectionExceptions can be avoided by a few configuration changes. By far, the most common cause of StaleConnectionExceptions is due to retrieving connections from the free pool (connection reuse) and finding out that the connection has been timed out or dropped by a database server or firewall.
The configuration options below help minimize the chances of encountering StaleConnectionExceptions by helping to ensure that your WebSphere Application Server maintains control of database connections, specifically, the only one closing them, rather than other components in your network:
- The datasource MIN connections should be set to 0. By setting a MIN > 0, you are telling the application server that this number of connections will be valid for the life of the application server JVM. Unless you can guarantee the MIN number of connections will never be timed out or dropped, setting a MIN will cause a StaleConnectionException when the connection becomes invalid.
- If a firewall exist between your WebSphere Application Server and database server, set the datasource Unused Timeout to no greater than 1/2 the value configured for the firewall timeout. This is how long you will allow an unused connection to sit in the free pool waiting to be reused.
- Set the Reap Timer to a value less than the Unused Timeout. The reap time is how often, in seconds, a pool maintenance thread is run to check if an Unused Timeout has occurred. The smaller the value, the more often the check occurs and the more accurate the Unused Timer becomes. However, the Reap Time interval also affects performance. Smaller intervals mean that the pool maintenance thread runs more often and can degrade performance.
- Set the datasource Purge Policy to entire pool. This option determines if you will mark only the offending connection as stale, should a fatal error occur (FailingConnectionOnly), or if all other connections waiting in the free pool should be flushed as a result as well (EntirePool). The best assumption is usually that if one is bad, they all are, and flush the free pool. This way, subsequent connection request will always get a new connection. Note that configuring the purge policy for FailingConnectionOnly will cause the WebSphere connection manager code to go into connection pre-test mode should a fatal error be encountered on a free pool connection.
For more information about StaleConnectionExceptions, including an example of how recovery can be coded into your application, see the IBM Technote Demystifying the WebSphere StaleConnectionException
Also see this related blog entry: WebSphere Application Server - Avoiding Stale Connections
title image (modified) credit: (cc) Some rights reserved by geralt