Service Bulletin 163: Information about a potential z/TPF problem that might result in a CTL-1 and/or a CTL-571 dump.
RickSchoonmaker 110000J2TP Visits (1849)
This Service Bulletin contains information about a potential z/TPF problem that might result in a CTL-1 and/or a CTL-571 dump.
The problem is caused by chain corruption in the z/TPF system's TCP/IP native stack. Our investigation shows the corruption is because of a timing problem related to the use of the Open Systems Adapter (OSA) staging queue code in the z/TPF system. The staging queue is used when the system cannot start processing new messages, for example during dump processing, input list shutdown conditions, and recently on PUT 6, invocations of OSA polling from external interrupt processing. External interrupts from OSA indicate that new input messages are available and the z/TPF system should be processing those messages before the read buffers become full. The external interrupt code is designed to ensure that polling is started even if the z/TPF system cannot get back to the CPU loop in a reasonable amount of time. The staging queue was designed to pull messages out of the OSA read buffers as quickly as possible to avoid packet loss during these conditions. If the OSA read buffers fill up, eventually the OSA card will discard new packets that are received.
When these conditions are detected, packets are not processed at all; instead they are added to the staging queue. Part of this processing is to find the socket block entry that is associated with the inbound message and to determine the input priority of that message (high vs regular priority messaging). The address of the socket block entry is saved in the IP message table (IPMT) block that contains the input message for easy retrieval when the message gets processed later. The problem can occur when this message does come off the staging queue and begins processing. The initial processing of CTT6READ does not verify if the socket block address that is passed in the IPMT block still represents an active socket. If this socket was cleaned up by the system between the time that the message was added to the staging queue and the time that the message was dequeued or processed off the staging queue, the z/TPF system ends up processing a message for a socket that no longer exists by using its old socket block entry. That socket block entry might be cleaned up or might have been reused for a new currently active socket.
PJ38676 has been opened to correct the issue. If the APAR cannot be applied in a timely manner, IBM strongly recommends applying the following patches to ctt6.cpy within CCTCP1. Note, the offsets that are identified here might vary on your production system. We recommend changing these values in the core copy of CCTCP1 first and ensuring it has no adverse affects before making it permanent by changing the file copy.
1. At label CTT6RTCP, the following code needs to change: 00000846 E340 D108 0004 00000108 28682= LG R4,CT6RSOCK LOAD SOCKET (IF EXISTS) ZTPF11 0000084C B902 0044 28683= LTGR R4,R4 DOES A SOCKET EXIST? ZTPF11 00000850 A774 02FF 00000E4E 28684= JNZ CTT6RTOK YES, BRANCH...DON'T HASH AGAIN ZTPF11 00000854 D203 D134 100C 00000134 0000000C 28685= MVC CT6RRIP,IPHDRSIP COPY REMOTE IP ADDRESS @412.029 0000085A D203 D130 1010 00000130 00000010 28686= MVC CT6RLIP,IPHDRDIP COPY LOCAL IP ADDRESS @412.029 00000860 D201 D13A 3000 0000013A 00000000 28687= MVC CT6RRPORT,ITCPSPORT COPY REMOTE PORT @412.029 00000866 D201 D138 3002 00000138 00000002 28688= MVC CT6RLPORT,ITCPDPORT COPY LOCAL PORT @412.029 0000086C 4170 D130 00000130 28689= LA R7,CT6RHASH POINT TO HASH INPUT @412.029 The JNZ at offset x'850' needs to be NO'OPd, this ensures we re-hash the socket to verify it still exists. The new instruction opcode should be: 00000850 A704 02FF 2. Similarly, a little ways down from label CTT6IPRT, the same check is made for UDP messages. 00000660 E3F0 719C 001A 0000019C 28406= ALGF R15,