Timed affinities
Sysplex distributor normally distributes each connection request, as it arrives to one of the candidate server instances, based on available capacity and policies in effect when the connection request arrives. In particular, each connection is assumed to be independent of all other existing and future connections, with respect to the server instance that will receive the incoming connection request.
Some applications, however, establish an affinity between a client and a particular server instance that needs to span multiple connections. TN3270E Telnet server printer sessions, for example, are based on a connection request from a Telnet client, and that printer session connection request needs to be routed to the same TN3270E Telnet server that is serving the LU2 session. Similarly, web-based applications, such as shopping carts, might need to have all connections from a particular client come to the server instance that has the contents of the shopping cart stored as session state.
The TIMEDAFFINITY parameter on the VIPADISTRIBUTE statement indicates to sysplex distributor that connections to a particular distributed DVIPA need to take into account the client origin. Connections from the same client, as identified by IP address, need to be routed to the same server instance, even when multiple server instances are hosted by a single target stack.
If a nonzero value for the TIMEDAFFINITY parameter is specified on a VIPADISTRIBUTE statement, the first connection from a particular client is routed as normal to a target stack and listening application. At that time, both the sysplex distributor routing stack and the target stack establish an affinity to govern subsequent connection requests from the same client. This affinity maintains a connection count, initially one. As subsequent connection requests for the same distributed DVIPA and port come in from the same client IP address, they are routed to the same server instance and the affinity connection count is incremented. As affinity-based connections, including the first one, are closed, the connection count is decremented.
When the last existing connection is closed and the count gets to zero, a timer of the duration (in seconds) specified by the TIMEDAFFINITY parameter is started. If another connection request is received from the same client to the same distributed DVIPA and port, the timer is stopped and the connection request is routed to the designated server instance. If no connection request is received from that client for the designated distributed DVIPA and port before the timer expires, the affinity is removed from the sysplex distributor routing and target stacks. The next connection request from that client for the distributed DVIPA and port will be routed according to normal sysplex distributor considerations of relative capacity and policies.
Connection requests that map to an existing affinity are always routed to that target regardless of the distribution method being used. This also applies to weighted active connection distribution. Although the established affinities are honored, distribution decisions for connection requests that have no affinity will include the active connections that were established as a result of an affinity relationship.
Using the VARY TCPIP,,OBEYFILE command with a profile containing a VIPADISTRIBUTE statement that specifies the TIMEDAFFINITY value as 0 prevents new affinities from being established. However, an existing affinity remains until all connections using the affinity end and the timer for the affinity expires. The timer value is not changed for an existing affinity by the new statement; instead, it continues to use the configured TIMEDAFFINITY value that was active when the affinity was established. To remove existing affinities, issue the VARY TCPIP,,OBEYFILE command with a profile containing a VIPADISTRIBUTE DELETE statement, followed by a VIPADISTRIBUTE DEFINE statement with a TIMEDAFFINITY value of 0.
Note that under some circumstances, a client's affinity with a specific target application server instance might be terminated prior to the specified time interval if key resources needed to satisfy new client TCP connection requests are not available. This includes the following scenarios:
- A target application server instance terminates, is quiesced for
DVIPA sysplex distributor workload balancing, or is no longer listening
on its specified port. Any affinities to the target application instance
are terminated and new connections for that port are no longer routed
to this server.
One exception to this scenario occurs for configurations where multiple applications are on the same system and TCP/IP stack, and also listen on the same port (that is, the set of applications comprises a shareport group). In this case, when one of the application instances in the shareport group terminates, is quiesced for DVIPA sysplex distributor workload balancing, or is no longer listening on the specified port, the routing stack continues to maintain any existing affinities, but only to this target system and TCP/IP stack, as long as at least one application instance in the shareport group is active, is not quiesced, and is listening to the specified port. When a new TCP connection is received from a client that previously had an affinity to the server that is no longer active, the request is routed to the same target system and TCP/IP stack. One of the other available servers in the shareport group is selected to process the request and a new affinity is established.
- A target system or TCP/IP stack is no longer available. All affinities associated with applications running on that system and TCP/IP stack are terminated.
- A target TCP/IP stack is no longer reachable across dynamic XCF links (that is, the dynamic XCF link to the target stack is no longer active). Any existing client affinities to applications on that target stack are terminated if new connection requests from these clients are received while the dynamic XCF link is not active. Note that this scenario occurs only if a dynamic XCF link becomes inactive and all attempts to automatically restart the link are unsuccessful.
- A target server is found to be unresponsive to accepting new connection requests. All affinities to the target server are terminated and new connections for that port are no longer routed to that server.
- A target server is unable to support new connections requests. Connections that are protected by IPSec UDP-encapsulated security associations that are negotiated with a peer behind a NAPT must be distributed to a V1R8 or later target. It is not always possible for the distributing stack to detect that a security association is being negotiated with a peer behind a NAPT. If an initial connection is distributed to a V1R7 target and an affinity to that target is established, and during negotiation of a subsequent security association it is determined that the peer is behind a NAPT, the affinity to the target is terminated.
The sysplex distributors notion of client is tied exclusively to source IP address on the connection request. This notion is not new, and is also true of other functions such as Policy Agent and IPSec. However, it can present problems in situations where many different connections from truly different client instances all appear with the same source IP address. Examples include the following situations:
- Proxy applications, such as a web HTTP server proxy, that initiate secondary connections on behalf of a large number of different clients. The secondary connections into sysplex distributor all look as though they come from the same client, and any affinity for those connections directs all connection requests to the same server.
- Network address translation (NAT), which keeps connection tables so that it can map multiple client-side addresses into a single server-side source IP address.
- Clients on z/OS® configured for SOURCEVIPA, or in general, many instances of a client on the same network node with one or very few IP addresses (as far as outbound connection requests are concerned).
- Clients running on the sysplex distributor routing stack, which is a special instance of the previous scenario. The source IP address for a client running on the sysplex distributor routing node is the distributed DVIPA, the same as the destination IP address. This is not new, and is true for any connection request issued for a server on the same stack, if the connection request socket was not bound to a particular IP address before issuing the connect() call.
In cases like this, distribution might be significantly less than optimal, because sysplex distributor has no way to distinguish among connection requests from different real clients all using the same source IP address. In the worst case, where all source IP addresses are the same (for example, where there is a single proxy instance in a firewall), there will be no load balancing at all as long as an affinity exists.
Essentially, timed affinity was created to allow connection workload distribution where it was not possible before, due to the client/server application model requiring a particular client to go to a particular server instance for some period of time. Such applications are not, at present, workload balanced. If a network configuration is such that a reasonable number of source IP addresses do not allow workload balancing, techniques to partition the work at the source must be implemented (configuring the clients to go to unique server addresses, or configuring the proxy to distribute the workload among multiple servers, as the WebSphere® HTTP server plug-in does). Where the anticipated number of different source IP addresses, and the connection request arrival rate, is large enough to provide reasonable balancing and reaction to changing sysplex workloads, this provides a reasonable solution while respecting affinities between clients and servers as required for the application. The main reason for configuring the affinity on the basis of each distributed DVIPA and port pair is that affinity requirements differ from application (port number) to application, and some applications do not need affinity at all. You must take into account your specific network configuration, and specifically the arrival rate of connections from different IP addresses, in determining whether timed affinity with sysplex distributor is appropriate for a particular application in your network.
Affinity information needs to be handled on the sysplex distributor routing stack, backup stacks, and target stacks. If a target stack is not z/OS V1R5 or later, server application instances that are part of a shareport group on that stack will not work properly, and affinities for that target stack will not be communicated to the backup routing stack on failure of a primary routing stack. If a backup routing stack is not z/OS V1R5 or later, affinity information will not be sent to it from surviving target stacks if it takes over from a failed routing stack. Therefore, it is strongly recommended that all TCP/IP stacks participating in distribution for a distributed DVIPA with affinities be at least z/OS V1R5.
You can use the Netstat VCRT/-V report option with the DETAIL modifier to display the affinity related information for each connection, as follows:
MVS TCP/IP NETSTAT CS V1R8 TCPIP Name: TCPCS 15:10:28
Dynamic VIPA Connection Routing Table:
Dest: 203.1.10.18..21 (1)
Source: 193.10.1.118..0
DestXCF: 193.1.1.108
CfgTimAff: 0200 TimAffCnt: 0000000003 TimAffLft: 0000
Dest: 203.1.10.18..21 (2)
Source: 193.10.1.118..1026
DestXCF: 193.1.1.108
PolicyRule: FTPD1
PolicyAction: paPRD-SD-7-INTR-SPECIAL
Dest: 203.1.10.18..21 (2)
Source: 193.10.1.118..1027
DestXCF: 193.1.1.108
PolicyRule: FTPD1
PolicyAction: paPRD-SD-7-INTR-SPECIAL
Dest: 203.1.10.18..21 (2)
Source: 193.10.1.118..1028
DestXCF: 193.1.1.108
PolicyRule: FTPD1
PolicyAction: paPRD-SD-7-INTR-SPECIAL
Dest: 203.1.10.18..21 (3)
Source: 193.10.1.119..0
DestXCF: 193.1.2.108
CfgTimAff: 0200 TimAffCnt: 0000000001 TimAffLft: 0000
Dest: 203.1.10.18..21 (4)
Source: 193.10.1.119..1030
DestXCF: 193.1.2.108
PolicyRule: FTPD1
PolicyAction: paPRD-SD-7-INTR-SPECIAL
Dest: 203.1.10.18..21 (5)
Source: 193.10.1.120..0
DestXCF: 193.1.1.108
CfgTimAff: 0200 TimAffCnt: 0000000000 TimAffLft: 0099
Dest: 204.2.10.11..21 (6)
Source: 193.10.1.199..1031
DestXCF: 193.1.6.108
PolicyRule: FTPD1
PolicyAction: paPRD-SD-7-INTR-SPECIAL
Dest: 205.2.10.11..21 (6)
Source: 193.10.1.199..1032
DestXCF: 193.1.6.108
PolicyRule: *NONE*
PolicyAction: *NONE*
The following notes apply to the preceding example.
- Affinity CRT entry for the three regular CRT entries that follow. If there is an affinity entry, it is shown before the regular CRT entries.
- Three regular CRT entries, associated with the single preceding affinity entry.
- Affinity CRT entry for the regular CRT entry that follows.
- Regular CRT entry associated with the previous affinity entry.
- An affinity CRT entry that has no connection associated with it. The use count is zero. There are 99 seconds affinity time left before this affinity entry is removed.
- Regular CRT entry. There is no affinity associated with it.