Technical Blog Post
WebSphere Application Server - Avoiding Stale Connections
In WebSphere Application Server, the StaleConnectionException is issued when the database vendor issues an exception indicating that a connection currently in the connection pool is no longer valid. This can happen for many reasons, including:
- The application tries to get a connection and fails, as when the database is not started.
- A connection is no longer usable due to a database failure. When an application tries to use a connection it has previously obtained, the connection is no longer valid. In this case, all connections currently in use by an application can get this error when they try to use the connection.
- The application using the connection has already called close() and then tries to use the connection again.
- The application tries to use a JDBC™ resource, such as a statement, obtained on a now-stale connection.
In such cases you will see the following in exception:
The first thing to check is that the application follows the recommended usage pattern. See the Connection handles topic in the product documentation.
Socket read errors, Socket I/O errors, Socket read timeout exceptions: Check the WebSphere server logs for any of these types of exceptions. If any are present then have your networking support run packet traces to check out any network connectivity issues.
If a firewall is involved then the recommendation is to set the aged timeout on the WebSphere datasource to recycle the physical connection at an interval less than the firewall timeout. For example, if the firewall timeout is 30 minutes, set the aged timeout to 29 minutes. This way the firewall will never see an idle connection from WebSphere and will not stop forwarding requests from WebSphere. The default for the aged timeout is 0 which means keep our physical connections around forever. Please also use the Purge policy of entire pool, not per connection.
Database/DataSource config setting coordination:
min connections = 0 max connections = 20 unused timeout = 1600 aged timeout = 1700 DB2's idle thread timeout = 1800
reap time = 180 keep dynamic = 1 (true)
In this example since keep dynamic is set after the application performs a commit, the connection remains 'active' in the db2 sense - db2 is keeping the prepared statements around - and the connection has to stay active for that to occur.
WebSphere's aged timeout is less than 1800 seconds, but the reap time interval is defined as 180 seconds. The reaper is not running often enough to terminate ALL of the "old" connections before they get timed out by DB2. This is because the 180 second reap time interval is longer than the delta time interval between DB2's Idle thread time of 1800 seconds and WebSphere's aged timeout of 1700 seconds (180 second reap interval > 100 seconds). A 180 second reap time interval is unreliable, but a 60 second reap time interval would work.
Also Reap time and unused timeout settings do not cause the idle or unused connections to be discarded if the servant region is idle. This situation might cause some DB2 connections to be held longer than is necessary. If you prefer to have the connections discarded at the time specified by a combination of reaper time and unused timeout settings, even if this preference might cause an idle servant region to become active again, you can add the nondeferredreaper custom property to your JDBC driver provider data source settings.
Also see this related blog entry: WebSphere Application Server StaleConnectionExceptions