IBM Support

FTP timeouts during data transfer for the z/OS FTP client

Product Documentation


Abstract

This describes how to diagnose problems when FTP times out when transfering data to and from the z/OS FTP client.

Content

FTP timeouts can occur during data transfer for the z/OS FTP client. Many times this results in the z/OS FTP server issuing messages EZA2590E and EDC8121I (connection reset), EZA2589e (connection interrupted or timed out), or EZA1721W (server not responding)


Before you begin:
1. Be sure the following z/OS FTP client timer (specified in the client FTP.DATA file) default values are set to the defaults or higher. If you need assistance identifying which FTP.DATA the FTP server is using, the hierarchical search order can be found in the section on TCP/IP configuration data sets in the IP Configuration Reference manual.

      CCONNTIME         30
      DATACTTIME        120
      DCONNTIME         120
      INACTTIME         120
      MYOPENTIME        60
      PROGRESS          10    
2. Make sure your FTP KEEPALIVE option has a value of 60 seconds or greater.

3. If you have a checkpoint firewall, make sure you have turned off the FTP enforce new line setting in the firewall's configuration. Here are instructions on how to do this:
    3.1 Edit the $FWDIR/lib/base.def
    3.2 Change (comment out) the following line:
        #define FTP_ENFORCE_NL
      to:
        //#define FTP_ENFORCE_NL
    3.3 Re-install the security policy

Steps for resolving FTP timeouts during data transfer for the client :
1. Before retrying the failing data transfer, start the packet trace facility. Detailed instructions for doing this can be found in How to Collect PKTTRACE and CTRACE on z/OS. If this is a high-volume system, be sure to reference section 6 on how to filter tracing based on IP addresses and ports.

2. Obtain an z/OS FTP client trace with options BAS, TIM, and FLO. This can be done by issuing the following FTP subcommand before starting the failing data transfer:
      DEBUG FLO TIM BAS

3. If this is an interactive FTP (from the TSO command line or OMVS shell) the output of the trace will be sent to the screen. If this FTP is issued via a batch job, the debug output will be sent to SYSOUT DD.

4. Recreate the problem and deactivate the traces.

5. You will now need to perform trace analysis. Begin this by browsing the client trace and locate the time that the timeout occurred. Obtain the timestamp from the message. Determine the port on which the data connection is being opened by finding the PORT command, if this is an active data connection, or by using the PASV response, if this is a passive connection. The text for both of these messages will be a string in the format of u,v,w,x,y,z. This can be interpreted as follows:
    5.1 The IP address to which the data connection will be opened is u.v.w.x.
    5.2 The port to which it will be opened can be calculated using the following formula:
      (y x 256) + z.
    5.3 At this point, you have identified the timestamp, the IP address, and the port to which the data connection will be opened.

6. Format the packet trace in IPCS by issuing this command:
      CTRACE COMP(SYSTCPDA) LOCAL FULL OPTIONS((SESSION))
This command formats the trace into a summary for each IP connection.

7. Find the summary that represents the failing connection by looking for an FTP connection which encompasses the time stamp from the client. Use the combination of the port, the IP address, and the timestamp to ensure you have the correct summary. Here is a sample of what you will see:

Local Ip Address:                             x.x.x.x
Remote Ip Address:                                   y.y.y.y

Host:                                 Local,          Remote
 Client or Server:                  Unknown,         Unknown
 Port:                                   21,            1095
 Application:                           ftp,
 Link speed (parm):                      10,              10 Megabits/s
                                                                       
Connection:
 First timestamp:                 2007/08/24 10:21:54.688167
 Last timestamp:                  2007/08/24 11:46:16.591480

     
8. Scroll down to the bottom of the summary portion and review the count of different types of packets received during this connection. Pay particular attention to duplicate ACKs, Zero Window sizes, Retransmissions, Out of order Packets, and Fragments.

Packets:                               97,             109
  (x) Untraced Packets:                 3,               0
  (.) In-order data:                   52,              33 (53.60%), (30.27%)
  (a) Acknowledgments:                 20,              40 (20.61%), (36.69%)
  (+) Data and ACK:                    23,              30 (23.71%), (27.52%)
  (u) Duplicate ACKs:                   0,               0 ( 0.00%), ( 0.00%)
  (w) Window size updates:              1,               0 ( 1.03%), ( 0.00%)
  (z) Zero window sizes:                0,               0 ( 0.00%), ( 0.00%)
  (p) Window probes:                    0,               0 ( 0.00%), ( 0.00%)
  (k) Keepalive segments:               0,               0 ( 0.00%), ( 0.00%)
  (r) Retransmissions:                  0,               0 ( 0.00%), ( 0.00%)
  (o) Out-of-order:                     1,               2 ( 1.03%), ( 1.83%)
  (d) Delayed ACKs:                     0,               4 ( 0.00%), ( 3.66%)
  (f) Fragments:                        0,               0 ( 0.00%), ( 0.00%)
  (!) Dropped:                          0,               0 ( 0.00%), ( 0.00%)


9. If any of those have significantly higher percentages for inbound data, then the problem resides on the network, and further diagnostics will be required outside of the mainframe. Even if you see evidence of problems outside of the mainframe, continue with step 6 to investigate possible scenarios.

10. Scroll down past the summary to the individual descriptions and examine the approximate timestamp obtained from the trace.

11. Some of the things you can look for include:
  • A large value in the delta time column.

 TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp 
1      S   I   2499151066          0  64512     0   0.000000 14:23:50.221695 
2   A  S   O   1632413670 2499151067  65535     0   0.000027 14:23:50.221722 
3   A  S   O   1632413670 2499151067  65535     0   2.943323 14:23:53.165045 
4      S   I   2499151066          0  64512     0   0.002726 14:23:53.167771 
5      S   I   2499151066          0  64512     0   5.919115 14:23:59.086886 

This shows a timeout attempting to establish a connection. However, similar sequences may also occur during mid-transfer. In packet 1, we see the remote host attempting to open a connection by sending the local host a SYN packet. The local side correctly sends the ACK SYN response in packet 2. The next packet should be an ACK sent from the remote host. Because this never arrives, the local host retransmits the ACK SYN in packet 3 (2.94 seconds later). Packets four and five are retransmissions of the initial SYN the remote host sent in packet 1. This indicates that the local host's ACK SYNs never arrived at the remote host, and that there may be routing issues between the two hosts, or a firewall/router in between the two hosts that is dropping the responses.

  • A capital R indicating reset in the TcpHdr column.

Example 1
 TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
1      S   I    965574368          0 368640     0   0.000000 10:44:43.727489
2   A  S   O   2501664216  965574369 368640     0   0.000125 10:44:43.727614
3   A      I u  965574369 2501664217 368640     0   0.000819 10:44:43.728433
4   A      O . 2501664217  965574369 368640  8940   0.268255 10:44:43.996688
5   AP     O . 2501673157  965574369 368640  3695   0.000140 10:44:43.996828
6   AP  F  O ? 2501676852  965574369 368640     0   0.002937 10:44:43.999765
7   AP     O r 2501664217  965574369 368640  8940   2.191677 10:44:46.191442
8   AP     O r 2501664217  965574369 368640  8940   4.438225 10:44:50.629667
9   AP     O r 2501664217  965574369 368640  8940   8.958191 10:44:59.587858
10  AP     O r 2501664217  965574369 368640  8940  17.822049 10:45:17.409907
11  AP     O r 2501664217  965574369 368640  8940  36.668251 10:45:54.078158
12     R   O   2501664217  965574369      0     0  72.829141 10:45:54.078158

Example 1 shows a remote host failing to acknowledge data sent by the local host. The connection is typical up until packet 7. Packet 7 should show an ACK for the 12635 bytes of data, plus the FIN sent on packet 6. Instead, nothing arrives from the remote host, and the data is retransmitted on packets 7 through 11. Eventually, the local host assumes an error has occurred and resets the connection.

Example 2
 TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
1   A      I . 1428014775   99166319  64856   524   0.015828 10:07:53.533832
2   A      I . 1428015299   99166319  64856    24   4.941343 10:07:58.475175
3   A      O d   99166319 1428015323 262144     0   0.261318 10:07:58.736493
4   A      I . 1428015323   99166319  64856   524   0.055756 10:07:58.792249
5   A      I . 1428015847   99166319  64856   524   0.000005 10:07:58.792254
6   A      I . 1428016371   99166319  64856   292   4.960406 10:08:03.752660
7   A      O d   99166319 1428016663 262144     0   0.261318 10:07:58.736493
8     R    O d   99166319 1428016663 262144     0  73.878330 10:09:17.630990

In Example 2, the connection proceeds normally until packet 8. Note that in packet six, the amount of data sent by the remote host (292 bytes) is lower than the 524 bytes sent on most of the other packets. This usually indicates that there is no more data left to be sent. On packet 7, the local host ACKs the last data sent. However, it can do nothing more until either the remote host sends more data or it sends a FIN packet to indicate that all of the data has been transferred. Because the remote host does neither, it resets the connection in packet 8.

  • A lower-case Z in the F (flag) column indicating that the window size has gone to zero.

 TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
1   A      I   3809785819 2800163550  27768     0   0.000021 10:36:57.019183
2   A      O   2800164998 3809785819  65535  1448   0.000048 10:36:57.019231
3   A      O   2800173686 3809785819  65535  1448   0.000001 10:36:57.019232
4   A      O   2800175134 3809785819  65535  1448   0.000007 10:36:57.019239
5   A      I a 3809785819 2800191062    256     0   0.000035 10:36:57.019274
6   A      O   2800163550 3809785819  65535  1448   0.000551 10:36:57.019825
7   A      O   2800164998 3809785819  65535    12   0.000005 10:36:57.019830
8   A      I   3809785819 2800191062    256     0   0.000029 10:36:57.019859
9   A      O o 2800191062 3809785819  65535  1448   0.000032 10:36:57.019891
10  A      O . 2800192510 3809785819  65535  1448   0.000203 10:36:57.020094
11  A      I a 3809785819 2800191318      0     0   0.000029 10:36:57.020123
12  A      O . 2800193958 3809785819  65535  1448   0.000032 10:36:57.020155
13  A      I z 3809785819 2800191318      0     0   0.000016 10:36:57.020171
14  A      O . 2800195406 3809785819  65535  1448   0.000128 10:36:57.020299
15  A      I z 3809785819 2800191318      0     0   0.000021 10:36:57.020320
16  A      O r 2800191318 3809785819  65535     1   0.260479 10:36:57.280799
.......lines 17 - 29 removed for brevity
30    R    I z 3809785819 2800191318      0     0  25.544452 10:38:00.887886

This example shows a situation in which the remote side has set its window size to zero. The window size tells the local host how much unacknowledged data it can send at once. In packets five and eight, we see that the remote host has a window size of 256 bytes. However, in packet eleven, this drops to zero. Starting in packet 16, the local host begins sending 1 byte probes to see if the window size has increased. This will continue indefinitely until the remote host increases its window size or the connection is otherwise terminated. In this example, the remote side ultimately RESETs the connection in packet 30.

  • A lower-case O in the F (flag) column indicating that the remote host sent out of order data.

 TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       TimeStamp
1   AP     O a  663518973 1680312427  65535     0   0.000094 21:53:58.470801
2   A      I . 1680312427  663518973  64094  1460   0.000020 21:53:58.470821
3   A      I . 1680313887  663518973  64094  1460   0.000093 21:53:58.470914
4   AP     O a  663518973 1680315347  65535     0   0.000065 21:53:58.470979
5   A      I . 1680315347  663518973  64094  1460   0.000254 21:53:58.471233
6   A      I . 1680316807  663518973  64094  1460   0.000006 21:53:58.471239
7   AP     O a  663518973 1680318267  65535     0   0.000145 21:53:58.471384
8   A      I . 1680318267  663518973  64094  1460   0.000018 21:53:58.471402
9   A      I o 1690049219  663518973  64094  1460   1.064534 21:53:59.535936
10  A      I . 1690050679  663518973  64094  1460   0.000047 21:53:59.535983

In this example, packets are processed correctly on lines 1 through 8. Note that the remote host is sending 1460 byte packets to the local host, who is ACKing them correctly. However, on line nine the remote host sends an out-of-order packet. This can be identified by two methods. First, the F column shows an 'o' flag, meaning out-of-order. This can also be verified by reviewing the sequence numbers (Seq column) of the inbound packets. They progress as follows:

1680318267
1690049219
1690050679

Because the sequence numbers increment by the number of bytes sent on each, they should have progressed as:

1680318267
1680319727
1680321187
<...roughly 6661 more missing packets>
1690046299
1690047759
1690049219
1690050679

  • The remote host stops responding to keep-alive packets.

  TcpHdr   IO F        Seq        Ack RcvWnd  Data Delta Time       
1   AP     O + 3101947168 1513769269  65535    54   0.000318  
2   A      I a 1513769269 3101947222  65535     0   0.166097 
3   AP     O k 3101947221 1513769269  65535     0 185.834001 
4   A      I u 1513769269 3101947222  65535     0   0.001453 
5   AP     O u 3101947221 1513769269  65535     0 190.768437 
6   A      I u 1513769269 3101947222  65535     0   0.001386 
7   AP     O u 3101947221 1513769269  65535     0 190.735614 
8   A      I u 1513769269 3101947222  65535     0   0.001089 
9   AP     O u 3101947221 1513769269  65535     0 190.714878 
10  A      I u 1513769269 3101947222  65535     0   0.002175 
11  AP     O u 3101947221 1513769269  65535     0 190.763630 
12  A      I u 1513769269 3101947222  65535     0   0.001706 
13  AP     O u 3101947221 1513769269  65535     0 190.757661 
14  AP     O u 3101947221 1513769269  65535     0  76.292870 
15  AP     O u 3101947221 1513769269  65535     0  76.299600 
16  AP     O u 3101947221 1513769269  65535     0  76.299781 
17  AP     O u 3101947221 1513769269  65535     0  76.307884 
18  AP     O u 3101947221 1513769269  65535     0  76.296677 
19  AP     O u 3101947221 1513769269  65535     0  75.286330 
20  AP     O u 3101947221 1513769269  65535     0  75.281099 
21  AP     O u 3101947221 1513769269  65535     0  76.294458

This shows the events occurring on a control connection during a long data transfer. The data transfer itself completes normally. In order to keep the control connection from being timed out, the local TCP/IP stack sends keepalive packets. These are seen on lines 3, 5, 7, 9, 11, and 13. The remote stack responds correctly to these on lines 4, 6, 8, 10, and 12. However, it never responds to the keepalive sent on line 13. Instead, this packet is retransmitted with no response on lines 14 through 21. As a result, the local stack assumes that the remote host has dropped the connection, and closes its own connection. Note that no RESET is sent. In this situation, a RESET would be sent only if the remote host attempted to send another packet to the local stack.

12. Once you identify a problematic area in the trace, pay particular attention to both the IO and the Delta Time columns. IO indicates if the packet was inbound or outbound, and Delta Time indicates how much time has passed since the last packet was sent or received. The combination of these two fields can help determine whether or not the problem resides on the local client or on the remote host. For example, an inbound RESET might suggest a problem on the remote server. Conversely, an inbound RESET with a large delta time suggests that the remote server timed out waiting for a response from the client, and further investigation should be performed locally.
    12.1 If you determine that the problem is occurring outside of the mainframe, additional investigation must be done on the remote client itself, or possibly on the network segments between the client and server.
    12.2 If you determine that the problem is occurring on the mainframe, additional diagnostics must be done on the local TCPIP stack and the FTP client. Collect a new set of documentation including:
      A new client trace
      A new SYSTCPDA trace
      A dump of TCPIP and, if possible, the FTP client, with SDATA options
        SDATA=(ALLNUC,CSA,LSQA,SQA,PSA,RGN,SWA,TRT,LPA)
      SYSTCPIP trace with options
        OPTIONS=(TCP,INTERNET,PFS,SOCKET)

Again, you can find instructions for running SYSTCPIP and SYSTCPDA traces in How to Collect PKTTRACE and CTRACE on z/OS. With this set of documentation, contact the IBM support center.

[{"Product":{"code":"SSSN3L","label":"z\/OS Communications Server"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Component":"All","Platform":[{"code":"PF035","label":"z\/OS"}],"Version":"1.8;1.9;1.10;1.11;1.12;1.13;2.1;2.2","Edition":"All Editions","Line of Business":{"code":"LOB35","label":"Mainframe SW"}}]

Document Information

Modified date:
17 June 2018

UID

swg27011275