IBM Support

Common Reasons Why Connections Stay Open for a Long Period of Time in WebSphere Application Server

Technical Blog Post


Abstract

Common Reasons Why Connections Stay Open for a Long Period of Time in WebSphere Application Server

Body

 

 

 

When you are monitoring connection pool performance, you may occasionally notice that connections are staying open for long periods of time. This behavior can have many different causes and can lead to issues like the connection pool maxing out. The J2CA0045E message is a good indication that your connection pool is maxed out:

J2CA0045E: Connection not available while invoking method createOrWaitForConnection for resource jdbc/xxxxxx

Some of the most common issues that can lead to connections staying open for longer than intended are the following:

  • Applications Not Properly Closing Connections
  • Long Database Queries
  • Shared Connections with Local Transaction Containment
  • Embedded getConnection Requests

 

Because WebSphere Application Server (WAS) pools connections, a connection in the free pool for a long period of time is not considered an issue. When I talk about a connection being held open for a long period of time in this blog, I am referring to connections in the shared or unshared pool that are exhibiting this type of behavior.

 

A J2C connection pooling trace can really help troubleshoot the types of issues discussed in this post. I will walk through some of the trace points that can help identify particular issues. The trace string to enable is:

*=info:WAS.j2c=all:RRA=all:Transaction=all

 

Instructions for setting up the trace can be found at the following page:
MustGather: Connection pooling problems for WebSphere Application Server

 

When enabling this trace, you must start capturing the trace before the issue occurs. This allows you to review the complete lifecycle and queries on the connections.

 

 

Applications Not Properly Closing Connections

When writing an application to use the WAS datasource, we recommend that you follow the get/use/close pattern. There are 3 steps in this pattern:

  1. Get
    1. This is when the application makes a request to get a connection from the pool
    2. The application will first lookup the datasource name and then do a getConnection() call
  2. Use
    1. This is when the application executes a query and waits for a response
  3. Close
    1. This is the final stage when the application has received the response from the database and is now done using that connection
    2. The application should call close() in a finally block to close out the connection and return it to the free pool

If your application does not follow this pattern, you may see connections staying open for long periods of time. This is especially seen when the application does not close the connection properly. After the response is received from the database, the application should be calling close in a finally block. If the application does not explicitly close the connection, that connection will stay open forever in WAS until that server is restarted. This is known as a connection leak. To resolve this type of issue, you will have to modify your application to close every connection.

 

You can also see connections staying open for longer than expected if your application goes on to do work unrelated to that connection instead of closing it first. The other work can be time consuming which causes the connection to stay open longer than necessary. For example, you get a connection, send a query, receive a response, and then go on to do a webservices call before calling close on that connection. The connection will remain open until that webservices call has completed and the application calls close. To avoid these types of delays, it is best to follow the get/use/close pattern.

 

To verify that the connection is getting closed properly, it would be best to review the application code directly and make any necessary changes.

 

 

Long Database Queries

When the application makes a query, WAS passes that request to the driver and then waits for a response. WAS is not involved in modifying or processing the query or its contents. If the query is taking a long time to complete, the connection in the shared or unshared pool will continue to wait for a response which will cause the connection to stay open. Resolving the long running query issue will allow the connection to get closed quicker.

 

To determine if this is what is occurring in your environment, you should check the following:

  1. Check the performance of the queries on the database. Most often, the database will show that certain queries are taking a longer time to process than expected. Resolving the performance issues on the database side will allow the connection to get returned to the free pool quicker.
  2. If there are no long queries on the database side, the next step would be to check the network for any issues. Performance or other network level issues can lead to delays in WAS receiving the response from the database. This will cause the connection to stay open in the shared or unshared pool for longer than expected.
  3. You can also check the WAS trace to determine what queries are being made and how long they are taking. You will want to look for the executeQuery/execute Entry and executeQuery/execute Exit messages to determine how long it takes to receive a response for the query. You need to make sure the Entry and Exit messages you’re reviewing are on the same thread id. When reviewing a WAS log, the thread id is the value after the timestamp. In the examples below, the thread ids are 00000071 and 0000006c.  Some query examples are below:

 

[3/31/17 8:39:51:337 EDT] 00000071 WSJdbcStateme >  execute Entry

                                 com.ibm.ws.rsadapter.jdbc.WSJccStatement@64a82cfc

                                 UPDATE TEST1 SET STATUS='STEP1,SECOND_RELEASE' WHERE ITEM_ID='1234' AND ITEM_REVISION='005'

 

[3/31/17 8:39:51:341 EDT] 00000071 WSJdbcStateme <  execute Exit

                                 UPDATE

 

[3/31/17 8:44:10:092 EDT] 0000006c WSJdbcStateme >  executeQuery Entry

com.ibm.ws.rsadapter.jdbc.WSJccStatement@5449999s

                                 SELECT XY,TEST FROM TEST2 WHERE T_ID='1' AND D_ID='2'

 

Note: When reviewing a connection or query, you can keep an eye out for messages like the following. These can help identify what the app name, module name, and bean name are that made the request. This can help you identify what part of your application is involved.

 

[3/31/17 8:44:10:092 EDT] 0000006c AdapterUtil   3   The application

server name is:

                                 wascell#wasnode1#server1

[3/31/17 8:44:10:092 EDT] 0000006c AdapterUtil   3   The app name    is:

                                 TESTAPP

[3/31/17 8:44:10:092 EDT] 0000006c AdapterUtil   3   The module name is:

                                 TESTAPP.war

[3/31/17 8:44:10:092 EDT] 0000006c AdapterUtil   3   The bean name   is:

                                 testapp

 

[3/31/17 8:44:10:095 EDT] 0000006c WSJdbcStateme <  executeQuery Exit

com.ibm.ws.rsadapter.jdbc.WSJccResultSet@523846d9

 

When reviewing the examples above, you can see that both the queries took only milliseconds to complete. If you had long running queries, you would see a longer time gap between the Entry and Exit messages. In order to determine if a query is taking a long time, you should know what the expected response time for it is. Then you can compare the time you’re seeing in the trace to the expected time to verify if there are any delays.

 

In addition to fixing any performance issues on the database and network sides, you can also set a query timeout that can help reduce the number of long running connections due to long running queries. The query timeout error will be given back to the application for handling. Setting a query timeout will not resolve why the queries are taking a long time to complete, but it will timeout any long queries so the application can handle the exception and free up that connection. More information on the query timeout can be found at the following pages:

 

You can configure the query timeout using either of the following custom properties:

  • webSphereDefaultQueryTimeout establishes a default query timeout, which is the number of seconds that an SQL statement may execute before timing out. This default value is overridden during a Java Transaction API (JTA) transaction if the syncQueryTimeoutWithTransactionTimeout custom property is enabled.
  • syncQueryTimeoutWithTransactionTimeout uses the time remaining (if any) in a JTA transaction as the default query timeout for SQL statements.

 

Procedure

  1. Open the administrative console.
  2. Go to the WebSphere Application Server Data Source properties panel for the data source.
    1. Click Resources > JDBC > Data Sources > data_source
    2. Click WebSphere Application Server Data Source properties.
  3. Click Custom properties under Additional Properties.
  4. Click New.
  5. Enter webSphereDefaultQueryTimeout in the Name field.
  6. Enter the number of seconds to use for the default query timeout in the Value field. The timeout value is in seconds. A value of 0 (zero) indicates no timeout.
  7. Click OK.
  8. Click New.
  9. Enter syncQueryTimeoutWithTransactionTimeout in the Name field.
  10. Enter true or false in the Value field. A value of true indicates to use the time remaining in a JTA transaction as the default query timeout.
  11. Click OK.
  12. Save your changes. The updates go into effect after the server is restarted.

 

 

Shared Connections with Local Transaction Containment

By default, shared connections will be used for a datasource. This can be changed to unshared connections at the application level or the WAS level. In WAS, you can set either defaultConnectionTypeOverride or globalConnectionTypeOverride as a connection pool custom property to make this change. This is explained in more detail in the following page: Tuning connection pools

 

The connections in the pool must be managed within a transaction. If your application does not get the connection within a global transaction, a Local Transaction Containment (LTC) will be used. This LTC will remain open until the servlet ends. More information on LTC can be found at the following page: Local transaction containment

 

A shared connection within an LTC will not be returned to the free pool until that LTC ends. This means that if you get a connection, make a query, and then call close on that connection, it will remain in the shared pool until that LTC ends. If it takes a long time for the LTC to complete, you may see connections in the shared pool staying open for longer than expected even though the application has already called close on the connection. When shared connections are used with an LTC, the connections will stay in the shared pool after close is called to allow the application to reuse that same connection on that thread within the same LTC. This sharing only occurs if certain connection sharing requirements are met. The following page goes through these in more detail: Unshareable and shareable connections

 

This type of sharing can be beneficial to performance if your application was specifically written to take advantage of this connection sharing. In my experience, the majority of the applications I see are not written with connection sharing in mind. If that is the case, changing from shared to unshared connections can help the connections return to the free pool quicker. When unshared connections are used, the connection is returned to the free pool when the application calls close. It does not wait for the LTC to complete. Changing to unshared connections will only have an effect if your application has already called close on the connection. If your application does not properly close the connection in a finally block, changing to unshared connections will have no effect.

 

Note: If you are using Business Process Management (BPM), do not change to unshared connections. The BPM application takes advantage of connection sharing and therefore, must use shared connections. For other stack products on top of WAS, please check the specific product recommendations before changing to unshared.

 

To determine if this is what is occurring in your environment, you should check to see if the application has called close on the connection. This can be seen in the connection pool details in the trace. You will see something similar to the following:

 

[3/31/17 8:55:08:545 EDT] 0000006c PoolManager   3   reserve() exit

PoolManager name:jdbc/test1

PoolManager object:-1111222233

Total number of connections: 5 (max/min 10/1, reap/unused/aged 180/1800/0, connectiontimeout/purge 180/EntirePool)

                               (testConnection/inteval false/0, stuck timer/time/threshold 0/0/0, surge time/connections 0/-1)

                               (pool paused false, prePopulate alternate false, resourceFailBackEnabled true, isAlternateResourceEnabled false, disableDatasourceFailoverAlarm false, startFailBack false)

                               (isPartialResourceAdapterFailoverSupportEnabled false, isAlteranteResourcePoolManager false, resourceAvailabilityTestRetryInterval 10, currentInusePool null, currentMode 100, alternate jndiName null)

The waiter count is (primary) 0/0/0/0/0

The mcWrappers in waiter queue []

 

Shared Connection information (shared partitions 200)

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99999990;RUNNING;  MCWrapper id dd4444ff  Managed connection WSRdbManagedConnectionImpl@b0a368d0  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 0000006f Thread Name: WebContainer : 45 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99999990;RUNNING; Start time inuse Fri Mar 31 08:41:08 EDT 2017 Time inuse 180 (seconds)

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@4a781206;RUNNING;  MCWrapper id cc9999ff  Managed connection WSRdbManagedConnectionImpl@3a2347f5  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 0 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@4a781206;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99440590;RUNNING;  MCWrapper id 9b6666f6  Managed connection WSRdbManagedConnectionImpl@175594f6  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 0000006f Thread Name: WebContainer : 45 Connections being held 0 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99440590;RUNNING; Start time inuse Fri Mar 31 08:41:08 EDT 2017 Time inuse 180 (seconds)

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99440590;RUNNING;  MCWrapper id 3b1111f6  Managed connection WSRdbManagedConnectionImpl@41ec447  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 0000006f Thread Name: WebContainer : 45 Connections being held 0 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@99440590;RUNNING; Start time inuse Fri Mar 31 08:41:08 EDT 2017 Time inuse 180 (seconds)

  Total number of connection in shared pool: 4

 

Free Connection information (free distribution table/partitions 5/1)

  (0)(0)MCWrapper id 125555ab  Managed connection WSRdbManagedConnectionImpl@ed33338f  State:STATE_ACTIVE_FREE

  Total number of connection in free pool: 1

 

UnShared Connection information

  No unshared connections

 

 

There are 3 things to focus on when looking at the pool details:

  1. Check to see if shared or unshared connections are being used. This can be seen easily by verifying if the connections are under the ‘Shared Connection information’ section or the ‘UnShared Connection information’ section. In the example above, you can see that shared connections are being used.
  2. Determine if LTCs are being used. This can be easily verified by looking for the com.ibm.ws.LocalTransaction.LocalTranCoordImpl statement for each connection. If this is present, an LTC is being used.
  3. Determine if the application has called close. This can be verified by looking for the ‘Connections being held’ or ‘Handle count’ value. In older WAS versions, ‘Handle count’ is used, but in newer versions, it was changed to ‘Connections being held’. If this value is 0, that means the application has called close on that connection. If the value is 1 or higher, that means the application has not yet called close.

 

In the example above, you can see that LTCs are being used and that most of the connections have a ‘Connections being held’ value of 0 in the shared pool. This means that the application has already called close on the connection, but it is waiting for the LTC to complete before it can be returned to the free pool. In this case, changing to unshared connections would allow the connections to be returned to the free pool when the application calls close instead of waiting for the LTC to complete.

 

To change to unshared connections, set one of the properties below:

  • The defaultConnectionTypeOverride property changes the default sharing value for a connection pool. This property enables you to control connection sharing for direct queries. If resource references are configured for this data source or connection factory the resource reference's configurations take precedence over the defaultConnectionTypeOverride property settings. For example, if an application is doing direct queries and unshared connections are needed, set the defaultConnectionTypeOverride property to unshared.
  • The value specified for the globalConnectionTypeOverride custom property takes precedence over all of the other connection sharing settings. For example, if you set this property to unshared, all connection requests are unshared for both direct queries and resource reference lookups. This property provides you with a quick way to test the consequences of moving all connections for a particular data source or connection factory to unshared or shared without changing any resource reference setting.

 

If you specify values for both the defaultConnectionTypeOverride and the globalConnectionTypeOverride properties, only the values specified for the globalConnectionTypeOverride property are used to determine connection sharing type.

 

To add these new custom properties to the settings for a data source or connection factory connection pool, a new connection pool custom property must be created. To add one of these properties to a data source, use the administrative console. Click Resources > JDBC > Data sources. Select your data source from the list, and then click Additional properties > Connection pool properties > Connection pool custom properties > New. For other J2C or JMS connection factories, navigate to the connection factory definition in the administrative console. Then select Additional Properties > Connection pool > Connection pool custom properties > New. Now specify defaultConnectionTypeOverride or globalConnectionTypeOverride in the Name field and shared or unshared in the Value field. A server restart is needed for the changes to take effect.

 

Note: The properties must be set in the Connection pool custom properties and NOT the general Custom properties on the data source or connection factory.

 

 

Embedded getConnection Requests

If your application has embedded getConnection calls instead of following the get/use/close pattern described in the ‘Applications Not Properly Closing Connections’ section above, you may see connections staying open longer than expected. Embedded getConnection calls will follow the get/use/get/use/get/use/……/close pattern. This means that until the close is called at the very end, each embedded getConnection request will keep a connection open in the pool. This can lead to connections being held for a long time which can lead to the pool maxing out.

 

You can often tell if embedded getConnection requests are being made by reviewing the connection pool contents in a trace. You may see something like the following:

 

[3/31/17 8:39:51:493 EDT] 00000071 PoolManager   3   reserve() exit

PoolManager name:jdbc/test2

PoolManager object:-1666600000

Total number of connections: 10 (max/min 10/1, reap/unused/aged 180/1800/0, connectiontimeout/purge 180/EntirePool)

                               (testConnection/inteval false/0, stuck timer/time/threshold 0/0/0, surge time/connections 0/-1)

                               (pool paused false, prePopulate alternate false, resourceFailBackEnabled true, isAlternateResourceEnabled false, disableDatasourceFailoverAlarm false, startFailBack false)

                               (isPartialResourceAdapterFailoverSupportEnabled false, isAlteranteResourcePoolManager false, resourceAvailabilityTestRetryInterval 10, currentInusePool null, currentMode 100, alternate jndiName null)

 

Shared Connection information (shared partitions 200)

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id dd4744fc  Managed connection WSRdbManagedConnectionImpl@b0a368d0  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id fd39c386  Managed connection WSRdbManagedConnectionImpl@a2bb9581  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id cceea9ff  Managed connection WSRdbManagedConnectionImpl@3a2347f5  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id 9b62e5f6  Managed connection WSRdbManagedConnectionImpl@175594f6  State:STATE_ACTIVE_INUSE Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id 641ac1a4  Managed connection WSRdbManagedConnectionImpl@9da18059  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

    com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;  MCWrapper id 6b0dde6a  Managed connection WSRdbManagedConnectionImpl@72e1df9f  State:STATE_TRAN_WRAPPER_INUSE Thread Id: 00000071 Thread Name: WebContainer : 47 Connections being held 1 Used with transaction com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f;RUNNING;

  Total number of connection in shared pool: 6

  ....

 

 

There are 3 things to focus on when looking at the pool details:

  1. Thread Id and Thread Name: All the connections above have the same Thread Id: 00000071 and Thread Name: WebContainer: 47. This is a good indication that the application may be making embedded getConnection calls. When the get/use/get/use/get/use…../close pattern is used, all the getConnection requests are made on the same thread. Therefore, you will see that multiple connections in the pool will have the same Thread Id and Thread Name.
  2. Transaction being used: All the connections above are using the same LTC com.ibm.ws.LocalTransaction.LocalTranCoordImpl@f5e83c8f. This indicates that all the connections are within the same servlet and are related. If you notice that all the connections are within the same LTC or global transaction, it is a good indication that they are all related.
  3. Connections being held or Handle count value: This is important to check because it helps identify if the application has called close on the connection. If the value is 0, then close has been called by the application. If the value is 1 or higher, the application has not called close on the connection. When the get/use/get/use/get/use……/ close pattern is used, the application does not call close until the very last request is completed. That means that the rest of the connections will remain open in the pool with a ‘Connections being held’ or ‘Handle count’ value of 1 until that very last request completes. In the example above, the Connections being held 1 value is seen for all of the connections. This indicates that all the connections are still waiting on a close request from the application.

 

When you see all 3 of the symptoms above, it is a good indication that embedded getConnection calls are being made. To resolve the issue, modify the application to use the get/use/close pattern instead of the get/use/get/use/get/use/…../close pattern.

 

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"","label":""},"Component":"","Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm11081353