Technical Blog Post
What to do when threads are hung in socketRead() waiting for a response from a backend
A common cause of hung threads in WebSphere Application Server is when a thread sends a request to a backend server, such as a database, and waits an unexpectedly long time for a response. When this occurs, you will see socketRead() at the top of the thread's stack trace. For example:
[9/8/15 19:19:27:702 EDT] 00000062 ThreadMonitor W WSVR0605W: Thread "WebContainer : 2301" (00001785) has been active for 1425013 milliseconds and may be hung. There is/are 21 thread(s) in total in the server that may be hung.
at java.net.SocketInputStream.socketRead0(Native Method)
In this example, the application is executing a SQL statement (a query or an update) against a DB2 database and is waiting for a response from the database server. By default, the thread will continue to wait indefinitely until it receives a response. This means that the thread could be hung in this state for a very long time if the SQL statement is very complex and takes a long time to complete, if the database is performing very slowly, if the database server becomes unresponsive, or if there is a problem in the network connection between the application server and the database.
Fortunately, there are several actions that can be taken to prevent this from happening.
First, a query timeout can be set on the SQL statement. After the number of seconds specified in the timeout, the request will be interrupted and an exception will be thrown so that the thread will no longer be stuck waiting for a response. The application developer can call setQueryTimeout() on every java.sql.Statement in the application. An easier solution, however, is for the WebSphere administrator to set the webSphereDefaultQueryTimeout data source custom property. This timeout would apply for all SQL statements created on connections that were obtained from the data source. A second property, syncQueryTimeoutWithTransactionTimeout, can also be set as a data source custom property. With this set, WebSphere will calculate the time remaining before the transaction times out (if running within a global transaction) and set the query timeout to this value automatically.
It is also possible to enable WebSphere Application Server tracing to identify the specific SQL statements that are taking a long time to run. This can be useful if some, but not all, statements are running slowly. You can see this information in a full WebSphere connection pooling trace (*=info:WAS.j2c=all:RRA=all) but it can also be seen if the following lightweight trace is enabled:
In the trace, look for the "executeQuery Entry" and "executeQuery Exit" lines. You could also see "executeUpdate" or "execute" instead of "executeQuery". If you're not sure if the application is calling executeQuery, exceuteUpdate, or execute, you can simply search for "> execute" and "< execute" to find the same lines. For example:
[9/8/15 21:29:40:693 EDT] 0000001d WSJdbcPrepare > executeQuery Entry
[9/8/15 21:33:40:364 EDT] 0000001d WSJdbcPrepare < executeQuery Exit
In this example, you can see that the query took 4 minutes to complete. If you don't get any response from the database at all, you might only see an "Entry" with no "Exit". You can look a few lines above the "Entry" to find the SQL statement that was being executed. For example:
[9/8/15 21:29:40:570 EDT] 0000001d WSJdbcConnect > prepareStatement Entry
select * from table_name
TYPE FORWARD ONLY (1003)
CONCUR READ ONLY (1007)
Here, you see that "select * from table_name" is the SQL statement that was running slowly. Once you know the SQL statement, the next step is to contact the DBA to determine why the specific SQL statement is running slower than expected.
You might also see threads hung in socketRead() when connecting to other types of backend servers besides databases. This is common when applications send HTTP requests to remote servers or send Web Services requests and wait for a response. These cases can be resolved in a similar manner, either by setting a timeout to ensure that the request cannot wait longer than the timeout period to receive a response, or by working with the administrators of the backend server to identify why it might be slow to respond.
For JAX-WS Web Services that use the WebSphere Application Server JAX-WS Web Services engine, you can set a Read Timeout so that requests will timeout if there is no response from the Web Service. By default, this timeout is set to 300 seconds (5 minutes). This can be changed by setting the JVM custom property "timeout". Although the name of this property is so generic, its specific purpose is to set a Read Timeout for JAX-WS Web Services requests. You can also set the "readTimeout" property in the HTTP Transport Policy Set for the Web Service client or set "timeout" on the org.apache.axis2.context.MessageContext in the application code.
You can also enable Web Services tracing with the following trace specification:
In the trace, you can look for these messages:
WSWS3571I: Outbound HTTP SOAP request
WSWS3570I: Inbound HTTP SOAP response
These messages will tell you the times that the Web Services request was sent (Outbound HTTP SOAP request) and when the response from the Web Service was received (Inbound HTTP SOAP response). If you didn't receive any response at all, you would only see the WSWS3571I message without a corresponding WSWS3570I message. Both messages will include the full SOAP message, which is often helpful in determining why it may have taken a long time to receive a response.
For other Web Services implementations and other HTTP clients, there are other timeouts that can be set to timeout a request, usually called read timeouts or socket timeouts.
Finally, it is possible to enable a TCP/IP trace using a tool such as Wireshark to see packets sent across the network between the client and server. This can tell you if the problem is on the client side, server side, or the network.
Be sure to also read Doug Spath's blog entitled "Analogy for threads in socketRead."