Troubleshooting
Problem
This document contains a Telnet session drop checklist.
Resolving The Problem
Telnet sessions to the IBM i OS use a TCP/IP conversation to transport data to the System i interactive job. The TCP/IP conversation can be thought of as a "pipe" to transport data. The Telnet session is dependant on the ability of this pipe to reliably transport data. If the pipe closes or is clogged, the Telnet session can fail.
Telnet sessions are different from some other TCP/IP applications in this way. TCP/IP conversations to transport e-mail or even HTTP traffic are some different examples. These other applications are not dependant on a specific TCP/IP conversation or pipe. They can retry many times or move many files over many different conversations. This is why browsers or e-mail are successful when other TCP/IP applications (like Telnet) fail. Users tend not to notice if they receive mail notification several minutes beyond a reasonable time. However, an unreliable TCP/IP connection for only 30 seconds will be noticed by a user trying to use the pipe.
This document contains many items to check on the IBM i OS or the client to aid in problem determination. Understanding the IBM i and the client is the first step in problem determination of TCP/IP networks.
The IBM i Telnet server services any valid Telnet client. Telnet clients run on many different TCP/IP stacks, operating systems, and hardware. The checklist shows commands for a current Microsoft Windows 32-bit TCP/IP stack. If running another stack or if the commands do not work for you, refer to the information for replacement commands for the TCP/IP stack provider.
Time of Drop Tests and Checks:
Test | Command | Comments |
| PING client to System i | PING name/ipaddress -l 999 | Name or TCP/IP address of the System i (see Note 1). |
| PING System i to client | PING RMTSYS(name/ipaddress) PKTLEN(512) | Name or TCP/IP address of the client (see Note 1). |
| Trace route from client to System i | TRACERT name/ipaddress | Name or TCP/IP address of the System i (see Notes 1 and 2). |
| Trace route from System i to client | TRACEROUTE RMTSYS(name/ipaddress) PKTLEN(2000) | Name or TCP/IP address of the client (see Notes 1 and 2). |
| Status in NETSTAT | WRKTCPSTS *CNN | Find the client TCP/IP address and Port. Check State, Idle Time, and Option 5=Details (especially retransmits). |
| Status of Device | WRKCFGSTS *DEV devicename* | Active or not? |
| Status of Job | WRKJOB jobname | Active or not? |
| Job Log | Option 10 = Joblog - If job is still active. Option 4 if the job has ended and kept its log in a spooled file. | Note any fatal error messages and the from program and to program. |
| Job call stack | Option 11= Call Stack from WRKJOB screen | QT3REQIO is normal state waiting for client to send information. |
| PAL Entries of B600 7004 | STRSST Option 1, and then Option 1 | MA18260 explains codes (see Note 3). |
Notes:
| 1. | This function uses ICMP traffic through the network. Because Telnet uses TCP rather than ICMP, the results might not be important. Routers can be configured to not allow ICMP traffic to pass from one physical network to another. This stops Denial of Service attacks or other possible malicious action. |
| 2. | The route to the IBM i can be different than the route from the IBM i. Routes can change and still pass traffic. As long as every device along the current route is passing the data, the application will work. If a device in the route fails (for any reason), the network can reroute. As longs as the reroute is operational, reliable, and rerouted quickly, the application will still work. For any problems with routes, contact your network administrator or IBM Network Consultants. |
| 3. | These PAL (Product Activity Log) entries are logged only if the TCP attribute Log Protocol Errors is *YES. This option allows silently discarded frames to be noted in the PAL for problem determination purposes. |
For the PING and Trace Routes, compare the results at the time of failure to the same test run during the functional state.
Other Telnet-Specific Checks
Session Keep Alive
Run the CHGTELNA command, and press F4 to prompt or keyword TIMMRKTIMO: The Session Keep Alive parameter specifies the number of seconds between connection validation. Keep Alive is a poor term to use for this mechanism because it is misleading. This mechanism does not keep a Telnet session alive. This mechanism is a probe to detect when a Telnet session is dead. Dead Client Timer would be a better name. However, RFC1122 (the definition of TCP) uses the term Keep Alive.
Set this value according to your network. *CALC is approximately 600 second (or 10 minutes). Setting this value too high results in the delayed recovery of dead sessions (device descriptions, interactive jobs, and other resources). Setting this value too low will result in extra frames that can make a network problem worse.
QTVDEVICE Joblog
Run the WRKACTJOB SBS(QSYSWRK) JOB(QTV*) command. The QTVDEVICE job is used to start and end Telnet sessions on the System i. When a session is disconnected, the QTVDEVICE job might log a message to assist with problem determination. (Normally ended Telnet sessions might not log any message for obvious reasons.)
Common Messages
In job QTVDEVICE:
CPF5140 - Session stopped by a request from device &4.
Normal close or close the session after dead client discovery. Usually network problems.
CPF5144 - Error detected on device &4 that can be recovered.
Recovery after drop has occurred. Usually network problems.
TCP2552 - Device &1 associated with client &2 port &3 has been recovered.
Keep Alive timer popped or abnormal close. The System i sent a dead client probe to the Telnet client. The Telnet client did not respond. This was repeated until the Telnet client was determined to be dead. Verify that the network passes Keep Alives and otherwise is reliably passing data.
CPF87D7 - Cannot automatically select virtual device.
CPF87D3 - Internal system error occurred in program &1.
MCHxxxx - Any Machine Check message.
Apply current Telnet PTFs or call your Service provider if this message is listed in QTVDEVICE. If this message is listed in QSYSOPR or QHST, the message is listed independent of any drops that might have occurred.
Telnet Client Activity
Q1: Was the client active or is inactivity required?
A1: The activity on the client session will help determine the resolution. If the drop occurs while the user is busily using the session, something actively killed the session. If a certain amount of time has to elapse before a drop is noticed (for example, over 10 minutes or 2 hours, and so on), the network has stopped passing frames or timed out the session.
Q2: If a remote user (or users) is dropped, did any local users drop at the same time?
A2: If Telnet client drops are specific to certain sites, the vast probability is that the network is unreliable. The IBM i Telnet Server does not distinguish between local and remote Telnet clients.
Q3: Did a single client drop, or did multiple clients drop at the same time? If multiple clients dropped, is there some network-oriented similarity? For example, were only clients using a firewall or proxy disconnected?
A3: Multiple clients dropping at one time can help determine what part of the network is unreliable.
Q4: What was the last screen or action of user at time of drop?
A4: If collecting data, this will help identify where the drop occurred. In addition, this can help characterize the drop. Does the drop require inactivity or activity?
Q5: Is there any Sleep Mode on the client that will stop it from responding to Dead Client Detection?
A5: If the client goes into power saver mode on the Network Adapter, the client might not be able to respond to the dead client detection probe.
General Settings to Check
Line Description (DSPLIND linname): Verify that all parameters match network settings, specifically, the line speed and duplexity for Ethernet line descriptions. If it is set to *CALC, test the network transport reliability by changing to a specific value. The reverse also applies. Sometimes switches or other devices will change the speed or duplexity and cause a drop. Match your line description to your network settings.
TCP/IP Interface (WRKTCPSTS *IFC): Did the Interface change state during the failure? (Changed from ACTIVE to something else?) In addition, F11 will display line information and the Maximum Transmission Unit (MTU). Verify that the MTU for this interface is appropriate for your network.
TCP/IP Routes (WRKTCPSTS *RTE): Did any routes change during the failure. Or, did the state of a specific route change? Routes of type ICMP might need to be investigated by your Network Administrator.
General Code level (DSPPTF): Is the System i up to date on Program Temporary Fixes (PTFs)? Use the Recommended Fixes utility to get the latest PTFs for TCP/IP and Telnet http://www-912.ibm.com/supporthome.nsf/document/25304232.
Tracing
There are cases where it is appropriate to collect technical data about the drop. Some examples are:
| o | Fatal error messages in QTVDEVICE joblog (the CPF87D7 for example). |
| o | Certain client activity causes the line description or TCP Interface to FAIL on the IBM i. |
| o | Drops are reproducible at the exact same screen or data. |
| o | The Telnet session's NETSTAT Connection status is in an irregular state. |
| o | IBM Service requests the data. |
In other cases, where the data is not successfully traversing the network, other traces will assist in problem determination and resolution. Refer to your network administrator for these. Just because a network tool is plugged into the network and does not identify a problem does not mean that no problem exists. Sometimes, datagrams must be followed through each physical network and network device before the source of the problem is found.
The IBM i OS Software Maintenance Agreement (SWMA) contract covers configuration of the IBM i OS (and its associated products like Access for Windows). However, a SWMA contract does not cover Network Problem Determination. For IBM assistance with Network Problems, contact your local IBM Service provider and ask for a Network Consultant.
Historical Number
30033636
Was this topic helpful?
Document Information
Modified date:
18 December 2019
UID
nas8N1016577