IBM Support

Dropped Sessions and Disconnects

Troubleshooting


Problem

This document discusses how to troubleshoot and discusses possible causes for an application session drop or disconnect.

Resolving The Problem

It is a common issue to have to deal with an end user who is experiencing a drop or disconnected session. This could be FTP, Telnet or other TCP application which connects; however, at some point the user loses the session.

It is important once again to be as precise as possible in our definition of this issue. Are all users (local and remote users) experiencing this issue? Or, is only one user or one remote location experiencing this problem?

If all users, remote and local, connect to the same IP address (interface) on the IBM System i system and only one location or one user is experiencing disconnects, do not be concerned with the System i system. The System i system is going to service all incoming requests the same way. The System i system is going to route local traffic to the Local Area Network (LAN) and remote traffic based on the route specified in CFGTCP, Option 2. Once again, a key point is if only one user or one site is experiencing disconnects? Do these users connect to the same TCP/IP address as other users who do not experience disconnects? If the answers to these questions is yes, the issue typically does not reside on the System i system.

Configuration of the Line Description

If numerous users are experiencing disconnections, there is typically only one thing on the System i system that can result in this scenario. It is essential that the Line Description on the System i system is correctly configured. On the operating system command line, type the following:

WRKLIND SITEETH

Press the Enter key. Option 5 to display, and a screen similar to the following is shown:

                                                                         Display Line Description  
                                                     
 Line description . . . . . . . . . :   SITEETH      
 Option . . . . . . . . . . . . . . :   *BASIC        
 Category of line . . . . . . . . . :   *ELAN        
                                                     
 Resource name  . . . . . . . . . . :   CMN07        
 Online at IPL  . . . . . . . . . . :   *YES          
 Vary on wait . . . . . . . . . . . :   *NOWAIT      
 Network controller . . . . . . . . :   SITEENET      
 Local adapter address  . . . . . . :   00609439438A  
 Exchange identifier  . . . . . . . :   056D7D0F      
 Ethernet standard  . . . . . . . . :   *ALL          
 Line speed . . . . . . . . . . . . :   *AUTO        
 Current line speed . . . . . . . . :   100M          
 Duplex . . . . . . . . . . . . . . :   *AUTO        
 Current duplex . . . . . . . . . . :   *FULL        
 Maximum frame size . . . . . . . . :   1496          
 Maximum controllers  . . . . . . . :   40  
                                     
There are only a few parameters to be concerned with regarding the line description which would affect our ability to communicate with the switch or network. Line speed and Duplex are two of the parameters. There is one rule to setting line speed and duplex. They must match the switch. IBM cannot tell you what to set them to. You are plugging the System i system into a network device (switch), and you need to know how the port on that switch is configured. The vast majority of switches are auto negotiating, many users set Line speed and Duplex to *AUTO. Many other users specify these values to match their switch. In any case, the one rule is that these values must match the port that we are plugged into on the switch.

If the Current Line Speed and Current Duplex show the line to be running at 10M and *HALF, the Line Speed and Duplex settings were probably misconfigured. If Current Line Speed was 100M and Current Duplex was *HALF, this probably also indicates these settings were incorrectly configured. If Current Line Speed and Current Duplex are not running at the speed you have configured them at, typically a resolution can be found outside the System i system by swapping the cable, changing the port on the switch, or swapping out the switch entirely.

Any time a change is made to the line description, it must first be varied off. Varying off the line description is going to cause all users who use that line to lose their connections. Therefore, this must be done during a time when you can get all users signed off the system.

The correct procedure for making a change to the line description is as follows:
1Run the ENDTCP command, or end any and all TCP/IP interfaces associated with the line you are going to change. Then, run the CFGTCP command and select Option 1. Press F11 to see the status.
2Run the WRKLIND XXXX command, and select Option 8 to work with the status. Vary off the line.
3Run the WRKLIND XXXX and select Option 2 to change the line. Make your changes, and press the Enter key.
4Run the WRKLIND XXXX, and select Option 8 to work with the status. Vary on the line.
5Run the STRTCP or start any and all TCP/IP interfaces associated with the line you changed. Run the CFGTCP command, and select Option 1. Press F11 to see the status.
A common reason for session drops or disconnects is not having the having the line description configured correctly. IBM Software Support cannot tell you how to configure your line because parameters (for example, the Line speed and Duplex) are dependant upon your network hardware. A line running at something other than 1G/*FULL or 100M/*FULL duplex is highly suspicious and causes reason for concern.

Line Description Is Set Correctly and Sessions Are Disconnecting

If the line description is set correctly and sessions are disconnecting, there are a few potential things to look at. All of the following issues are outside of the System i system.

Receiving Message TCP2617

A common message regarding disconnections is message TCP2617. This is often found by running one of the following commands on the operating system command line:

DSPLOG QHST
DSPMSG QSYSOPR

A screen similar to the following is shown:
                                                                            System:   YourSystem
 Message ID . . . . . . . . . :   TCP2617                                      
 Message file . . . . . . . . :   QTCPMSG                                      
   Library  . . . . . . . . . :     QSYS                                        
                                                                               
 Message . . . . :   TCP/IP connection to remote system &2 closed, reason code  
   &5.                                                                          
 Cause . . . . . :   The TCP/IP connection to remote system &2 has been closed.
   The connection was closed for reason code &5.  Full connection details for  
   the closed connection include:                                              
     - local IP address is &1                                                  
     - local port is &3                                                        
     - remote IP address is &2                                                  
     - remote port is &4                                                        
 Reason codes and their meanings follow:                                        
     1 = TCP connection closed due to expiration of 10 minute FINWAIT2 timer.  
     2 = TCP connection closed due to R2 retry threshold being run.            
     3 = TCP connection closed due to keepalive timeout.  
                                                                                 
Local IP address is &1:This is going to be the TCP/IP address of the System i system which users connect to. You should see this address when running the CFGTCP command and selecting Option 1.
Local port is &3:This is going to be the port that the remote client is connecting to on the System i system. If it is 23, this was a Telnet session, 21 FTP, and so on.
Remote IP address is &2:This is going to be the TCP/IP address of the client (PC or otherwise) which was connected to the iSeries.
Remote port is &4:This is the port that the client was using to communicate (typically an ephemeral port).
A further explanation of reason codes:

Reason code 1: Not much of a reason for concern. The System i system probably received a request to close a connection. The System i system sent a FIN to the client as a part of normal shutdown procedure. The client did not ACK the System i FIN. Basically, the System i system is waiting for the client to send it the final ACK. It never gets it and closes the connection and posts this message. This is a client application issue due to the fact the client is not following protocol or more likely the client was shut down in such a way that it was unable send a final ACK.
Reason code 2:Typically this means the System i system received a request for data from the client specified. The System i system sends the data but does not receive an ACK. Because the System i system does not receive an ACK (acknowledgement) that the data was received, it resends the data. That data is resent 16 times (or however many times indicated by running the CHGTCPA command and pressing F4) and then closes the connection. At this point, the System i system is done trying. From a client perspective, this is what they may experience: A remote user has a iSeries Access for Windows 5250 Telnet session, the user types a command or chooses some menu option, the PC sends the data and sits with an X at the bottom. Meanwhile the System i system receives the request for data and sends it 16 times and never gets an ACK because the data never got to the client. The client experiences the session hanging and finally disconnecting because the requested data is not received.

How do you fix this? First, have you verified that the line description on the System i system is configured correctly (see this part at the top of this document)? If it has been verified that the line description is configured correctly and message TCP2617 is being posted, it can be surmised that users are having network-related disconnection issues. A communications trace would verify what we already know; the System i system receives a request, which it services. In fact, the System i system sends the data 16 times, and that data never gets acknowledged. The only thing we can say at this point is the System i system does exactly what it is supposed to do. The System i system sends the data. What happens to it or why it does not get to the client is be a good question for the network administrator. A network analyzer is required to determine exactly where these packets are being lost in the network.
Reason code 3:Typically not a huge concern. This indicates the System i system has not received any requests from the client specified in X amount of time. The System i system sends 1 byte of previously ACK'd data which the client should reply to. The System i system does not receive a reply to the keepalive probe and, therefore, closes the connection. Typically this is just a cleanup message in which the System i system is cleaning up old connections that were not shut down properly.

Telnet or iSeries Access for Windows Disconnect

When a user connects with Telnet, a virtual device is being used. This device name can be seen in the upper right corner of the session before logging on.

                                  Sign On                                      
                                               System  . . . . . :   YOURSYSTEM  
                                               Subsystem . . . . :   QINTER    
                                               Display . . . . . :   QPADEV007V
                                                                               
                User  . . . . . . . . . . . . . .                              
                Password  . . . . . . . . . . . .                              
                Program/procedure . . . . . . . .                              
                Menu  . . . . . . . . . . . . . .                              
                Current library . . . . . . . . .                              
 
In iSeries Access for Windows, you can specify a device name so, rather than having a generic QPADEVxxxx device, I name my device JEREMY or SESSION, and so on. In the example, I am used QPADEV007V.

WRKDEVD QPADEV007V

This shows the device. Then, select Option 8 to work with the status.

Work with Configuration Status                YOURSYSTEM
                                                             12/21/06  16:11:31
 Position to  . . . . .                Starting characters                    
                                                                               
 Type options, press Enter.                                                    
   1=Vary on   2=Vary off   5=Work with job   8=Work with description          
   9=Display mode status    13=Work with APPN status...                        
                                                                               
 Opt  Description       Status                -------------Job--------------  
      QPACTL01          ACTIVE                                                
        QPADEV007V      ACTIVE                QPADEV007V  JSCHULZ     555084  
                         
This shows that the device is active and even shows the interactive job associated with it. This is a very important point. If you have a user who is getting disconnected, have the user record the device name when logging on. When the user gets disconnected, do not have the user reconnect. Check the status of the device. If the device shows active, the System i system did not disconnect the session. The System i system is waiting for the next request to come in. Check the status of the device fairly quickly after the disconnection because, when the Telnet session keepalive (CHGTELNA) timer expires, keepalives are sent. These keepalive probes attempt to discover if the remote session is no longer active and, if so, recover the device. Typically, the Telnet keepalive parameter is set to *CALC which is approximately 10 minutes. This is a fairly common scenario. A Telnet user issues a command, and the session hangs for 10-15 seconds and then disconnects. If you check the status of the device on the System i system, it is still active. This is evidence that the System i system did not disconnect the session. A communications trace shows that the last request was never received by the System i system. Once again, if the System i line description is configured correctly, concentrate my problem determination efforts outside of the System i system. What happened to the request of the PC? This would be a good question to pursue with the network administrator because it is not getting to the System i Ethernet adapter.

FTP or File Transfer Disconnects

It is a common scenario to have a large file transfer that disconnects.

A valuable tool in seeing what is actually happening regarding this transfer is NETSTAT. Run the following command:

NETSTAT *CNN

At the top of this listing, you will find remote address maked with *. These entries are the servers that are listening on the System i™ system with status LISTEN. As you page down, you will find actual client addresses under the remote address field.

If you are initiating an FTP session from a PC to the System i system and doing an FTP GET of a large file which eventually disconnects before all the data was received, check this screen for information.

I would be looking for the TCP/IP address of the PC under remote address. You should see an FTP-control and FTP-data connection for the address of your PC. If you select Option 5 to display the details on the data connection, it should show us some important information:

9.10.53.138      1091       ftp-con >  000:00:13  Established
9.10.53.138      1358       ftp-data   000:00:13  Established

If you were doing an FTP GET of a large file from the System i system, you see a high number of "Bytes out" on the second page because the System i™ system is sending a large amount of data. If you continue to page down to the third page, you will see "Retransmission information".
                                                     
 Retransmission information:                            
   Total retransmissions . . . . . . . . . . . . :  479  
   Current retransmissions . . . . . . . . . . . :   0  

While the FTP transmission is active, you can view this screen and press F5 to refresh. Are the retransmissions constantly increasing? This is a cause for concern. This tells us the System i system sends data, does not receive an ACK (acknowledgement) of the data and must resend the data (possibly multiple times before the remote system acknowledges the data). Remember, it is rare to have a "perfect" network. It is typical to have some level of dropped frames on the network and some level of retransmissions. It is of concern when retransmissions are at a level that they are causing performance problems or if the System i system must resend data 16 times (or the value specified in CHGTCPA command and by pressing F4 to prompt the command) and the system is going to end the connection. The System i system sent the data, never received an acknowledgement for the data it is sending and, therefore, closes the connection.

Once again, one of the few things to check on the System i system is the line description. Is the line configured correctly? If the line looks correct and there are a large and ever increasing amount of retransmissions on large file transfers, this typically is a subject for the network administrator to pursue. The System i system is putting data on the line, but is not receiving the acknowledgements and must resend the data that occasionally results in disconnections.

Slow Sessions and Hangs

Typically when sessions hang or are slow, there are also disconnects. The same principles apply. For example, in a disconnect scenario, we typically see a high amount of retransmissions on the connection in NETSTAT *CNN. The sessions disconnect because the retry threshold is reached and the System i system quits trying to send data. Hangs and slowness can typically be attributed to retransmissions.

Once again, find the client TCP/IP address in Netstat Option 3. Type 5 to display the details, and watch the retransmission information while they are actively working and experiencing hangs or slowness. Do you see retransmissions ever increasing? Is the line configured correctly? If retransmissions are ever increasing and the line is configured correctly, the issue probably needs to be pursued with the network administrator. Data is needing to be sent multiple times by the System i system before it is acknowledged and received by the client.

If retransmissions are not being seen, remember that we are seeing only one side of the communications. Possibly the System i system sends are getting to the client; however, their requests to the System i system are slow. This issue must be worked on from the client side or at the network.

Is only one user or one site experiencing hangs? Do these users connect to the same TCP/IP address as other users who do not experience hangs or slowness? If the answers to these questions is yes, the issue typically does not reside on the System i system.

[{"Product":{"code":"SWG60","label":"IBM i"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"Communications-TCP","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"Version Independent","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Historical Number

437196437

Document Information

Modified date:
18 December 2019

UID

nas8N1014590