SQL30081N TCP/IP communication errors

This topic covers SQL30081N TCP/IP communication protocol errors and recommended solutions.

Cause

The cause of communication protocol errors can vary based on your platform. Each protocol error has its own definition and corresponding action plan.

The SQL30081N error message has the following format:

A communication error has been detected. Communication protocol being used: protocol. Communication API being used: interface. Location where the error was detected: location. Communication function detecting the error: function. Protocol specific error code(s): rc1, rc2, rc3.

Example

SQL30081N A communication error has been detected. Communication protocol
being used: "TCP/IP". Communication API being used: "SOCKETS". Location
where the error was detected: "". Communication function detecting the error: "connect".
Protocol specific error code(s): "111", "*", "*".

These error messages are returned when Db2® calls operating system socket APIs that receive an error message from a component outside of Db2. These socket errors are propagated back up to Db2, which encapsulates the error inside the SQL30081N message. The root cause is outside of Db2, which could be in the client/server network stack or network device situated between them. Network traces should be collected and analyzed by the network administrator from both client and server side to determine the root cause.

Solutions

The following table lists protocol specific errors that can occur on different platforms and the corresponding action plans to resolve these errors. If the error code is not listed in the table, search the operating system documentation, which can be found at /usr/include/errno.h (Linux/UNIX) or System Error Codes (Windows).

For non-Java application keywords, use reference the following materials:
  • db2cli.ini keywords: Used by CLI/ODBC applications

  • db2dsdriver.cfg keywords: Windows applications using IBM .NET provider

  • DB2 Registry variables (see Communications group)

  • FAQ JDBC ERRORCODE=-4499 connectivity: For Java applications

    Windows AIX SUN HP Linux Short Name Action Plan
    10061 79 146 239 111 ECONNREFUSED Connection Refused Client attempts to establish a connection to server using an invalid IP or port.

    Check on server side:

    Db2 environment variable DB2COMM is set such as: DB2COMM=TCPIP

    DBM CFG's SVCENAME is set to the instance's port number or service name. The command to update this parameter is: db2 update dbm cfg using svcename <port/service name>

    If service name is set, check the 'services' file to see if the name corresponds to an unused port number.

    Make sure Db2 server instance is started properly.

    Check on client's side:

    Node directory's entry:

    Service name should show the right port number or service name that corresponds to Db2 server's instance port (svcename setting)

    To check if server's port is opened: telnet <hostname> <port>

    If the command fails then the port on server is not opened and the problem is outside of Db2 area.

    For more information on ECONNREFUSED, see

    http://www.ibm.com/support/docview.wss?rs=71&uid=swg21328644
    Windows AIX SUN HP Linux Short Name Action Plan
    10053 72     103 SOCECONNABORTED Software caused a connection abort. Software closed connection

    If error is reported on client application which uses ODBC/CLI to connect to DB2 UDB server:

    Disable the Db2 CLI timeout:

    Add QUERYTIMEOUTINTERVAL=0 to the db2cli.ini file on the client's side.

    Check if applications have any timeout, for example, ADO timeout or VB timeout.

    If application connects to OS390 server, check idlethreadtimeout parameter (IDTHTOIN) on OS390.

    This parameter sets the active thread timeout limit on OS390.

    Windows AIX SUN HP Linux Short Name Action Plan
    10054 73 131 232 104 ECONNRESET Connection has been reset by partner Connected partner has closed the connection.

    Check any timeout limit on partner side, such as firewall, application, Db2 CLI layer.

    If error is reported on client application which uses ODBC/CLI to connect to Db2 UDB server:

    Disable the Db2 CLI timeout

    Add QUERYTIMEOUTINTERVAL=0 to the db2cli.ini file on the client's side.

    Check if there's any firewall between client and server.

    If it has any time limit on open connections, check if the applications have any timeout, such as ADO timeout or VB timeout.

    When a local database connection is cataloged using a different alias name than the database name, you might encounter error SQL30081 when attempting to connect to that database using a TCPIP connection. If you get the SQL30081 error, make sure the database is not cataloged using a different alias name than the database name that on the machine where that database resides.
    Windows AIX SUN HP Linux Short Name Action Plan
    10060 78 145 238 110 ETIMEDOUT Connection timeout Connection has reached the network timeout limit and is terminated by network.

    Timeout by TCP/IP layer

    TCP/IP has its own timeout value, if the open connection stayed too long, TCP/IP will force the connection off.

    Usually this is network issue

    Workarounds:
    1
    Windows AIX SUN HP Linux Short Name Action Plan
    10048 67 125 226 98 EADDRINUSE The specified address already in use A: Two instances are starting on the same machine listening on the same port (usually would trap on db2start).

    B: A client application or agent is making an outgoing connection attempt and is using a socket that is already being used by another connection to the database or is in the wait state (2MSL state).

    This usually only happens to Windows client:

    This is a Microsoft error. Winsock has given a port that is already in use (winsock defect) or is closed but still waiting in the wait state.

    Workaround for Windows:

    1. Adjust the length of time that a socket sits in wait state after being closed (default is two minutes)

    TcpTimedWaitDelay

    2

    2. Adjust the number of ports available (default is 5000)

    MaxUserPort

    3

    3. Adjust the use of connect / disconnect so that it doesn't cycle so rapidly in the program (best solution). 10048 is most often caused by rapid connection / disconnection logic in the application, which puts too many ports in the time_wait state (2MSL). Re-using the connection handle when an application is issuing multiple statements is the best way of handling this (do not disconnect then reconnect every time a statement completes)

    4. Implement client side connection pooling so that the application logic internally does not have to change. Make sure the pool is large enough to handle 80% of the connections. Make sure the pool has some form of re-connect logic in the case of a disconnect while idle.

    Windows AIX SUN HP Linux Short Name Action Plan
    10055 74 132 233 105 ENOBUFSNo buffer space available System running out of resource to complete the TCPIP call

    For Windows:

    The problem is caused by running out of Windows desktop heap or system page table entries. It is not Db2 related.

    Increase the Windows SystemPages registry entry.

    Windows AIX SUN HP Linux Short Name Action Plan
      32     32 EPIPE (Broken Pipe) Network connectivity problem between client and server, run network sniffer trace on both client and server.
    10004 (WSAEINTR)           Connection closed by client. For Db2 see technote
    10065 81     113 WSAEHOSTUNREACH No route to host. For Windows client, Linux server:Unset firewall on Linux server to allow connections to go through from clients.
The following are some situations in which no return code is returned such as the following examples:
Symptom 1
SQL30081N A communication error has been detected. Communication protocol being used 'TCP/IP'. Communication API being used 'SOCKETS'. Location where error was detected "192.168.1.200'. Communication function detecting the error 'SelectForConnectTimeout'. Protocol specific error code '0','*','*'. SQLSTATE=08001
Possible cause
192.168.1.200 is a virtual IP which was also assigned to other devices. This was causing intermittent error messages above. There could be other causes of this error message.
Symptom 2
SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected:"192.168.1.200". Communication function detecting the error: "recv".Protocol specific error code(s): "*", "*", "0". SQLSTATE=08001
Possible cause

*,*,0 indicates the connection was closed by the peer. This could be any network device (i.e. firewall, router, workload balancing device, etc...) between the client and DB2 server, or the DB2 server itself.

Disable any network security and server security software running on DB2 database server to determine this resolves the issue. Multiple security products from different vendors may interact negatively and manifest as *,*,0.

Check to determine if Db2 server has Workload Manager enabled in which SQL queries consuming longer than xx minutes will be disconnected. In example below it is 15 minutes.

db2 "select substr(workloadname,1,25) as workloadname,serviceclassname from syscat.workloads with UR"

WORKLOADNAME SERVICECLASSNAME
---------------- ------------------
SYSDEFAULTUSERWORKLOAD SYSDEFAULTSUBCLASS
SYSDEFAULTADMWORKLOAD SYSDEFAULTSUBCLASS
TEST_WL MAIN_SC

db2 "select workloadname,ENABLED from syscat.workloads with UR"

WORKLOADNAME ENABLED
-------------------- ----------
SYSDEFAULTUSERWORKLOAD Y SYSDEFAULTADMWORKLOAD Y
TEST_WL Y

CREATE THRESHOLD "TEST_UOW_CONC_TH"
FOR WORKLOAD TEST_WL ACTIVITIES
ENFORCEMENT DATABASE
WHEN UOWTOTALTIME > 15 MINUTES
COLLECT ACTIVITY DATA
ON COORDINATOR DATABASE PARTITION
WITH DETAILS
FORCE APPLICATION;

Also check Db2 server to determine if there are any communications registry variables set or if there are any messages around same timestamp in db2diag.log.

Additional troubleshooting

A) Collect Db2 trace from Db2 client and Db2 server side.

Note: This does not apply to Java applications using JDBC driver (db2jcc.jar/db2jcc4.jar). For Java applications refer to Collecting Data: Tracing with the IBM Data Server Driver for JDBC and SQLJ

Client

1) db2trc on -t -f ctrace.dmp -Madd SQLJC -Madd SQLJR -Madd SQLR -Madd SQLCC

2) Wait for SQL30081 to be reported

3) db2trc off

4) Format traces:

db2trc flw -t -wc ctrace.dmp ctrace.flw

Note: Older versions of DB2 do not support "-wc" option. Remove this parameter on older versions.

db2trc fmt ctrace.dmp ctrace.fmt

db2trc fmt -c ctrace.dmp ctrace_drda.fmt

Server

1) db2trc on -t -f strace.dmp -Madd SQLJC -Madd SQLJS -Madd SQLR -Madd SQLCC

2) Wait for SQL30081 to be reported

3) db2trc off

4) Format traces:

db2trc flw -t -wc strace.dmp strace.flw

db2trc fmt strace.dmp strace.fmt

db2trc fmt -c strace.dmp strace_drda.fmt

B) Network trace

Network sniffer traces (Linux:tcpdump, Windows: Wireshark) from client and server side should be collected and analyzed by network administrator to determine root cause of SQL30081 if the suggestions do not help since Db2 diagnostics do not provide visibility into operating system and network layers once socket API provided by operation system has been called.

C) Additional test tools pctt and ping

The stand-alone pct tool included in ~/sqllib/bin/pctt (Linux/UNIX) or C:\Program Files\IBM\SQLLIB\bin\pctt.exe (Windows) can be run in client and server mode. It sets up separate network connection between client and server outside of Db2 to test network connectivity. Note when specifying large buffer size in server mode (i.e. pctt s /b 32767) the listener may coredump. Use a smaller buffer size. This program may or may not reproduce any intermittent connectivity issues. Usage of pctt is documented in p.132 from IBM Db2 Universal Database Troubleshooting Guide.

The ping command included with Db2 may also be used to determine if there are inconsistent latency issues.

$ db2 ping dbName request 32767 5

Elapsed time: 692983 microseconds

Elapsed time: 419739 microseconds

Elapsed time: 419867 microseconds

Elapsed time: 210229 microseconds

Elapsed time: 210103 microseconds

$ db2 ping dbName response 32767 5

Elapsed time: 1241 microseconds

Elapsed time: 1217 microseconds

Elapsed time: 1236 microseconds

Elapsed time: 2575774 microseconds   <<<<< 2.5 seconds

Elapsed time: 3526 microseconds
1

Db2 uses TCP/IP's connection KEEPALIVE option to detect if there is a connection failure. This option transmits a message periodically to determine if the partner is still alive. If the partner fails to respond to this message, the connection is considered to be broken, and an error is returned.

2 This parameter determines the length of time that a connection stays in the TIME_WAIT state when being closed. While a connection is in the TIME_WAIT state, the socket pair cannot be reused. This is also known as the 2MSL state because the value should be twice the maximum segment lifetime on the network. See RFC 793 for further details.
3 This parameter controls the maximum port number used when an application requests any available user port from the system. Normally, short-lived ports are allocated in the range from 1024 through 5000. Setting this parameter to a value outside of the valid range causes the nearest valid value to be used (5000 or 65534).