APAR status
Closed as program error.
Error description
Using the Load Balancer with high availability configured, the backup load balancer may assuming forwarding role for no reason or send gratuitous arps which disrupts forwarded traffic.
Local fix
Problem summary
**************************************************************** * USERS AFFECTED: IBM WebSphere Application Server Load * * Balancer users with High Availity * **************************************************************** * PROBLEM DESCRIPTION: The backup load balancer may issue * * gratuitous ARPs for clusters * * disrupting forwarding temporarily. * **************************************************************** * RECOMMENDATION: * **************************************************************** If this problem occurs, packets for some cluster addresses will be temporarily sent to the backup load balancer (which will not forward the packets to servers). ARP tables for routers will show the MAC address for the backup load balancer when this issue occurs. Network traces will show gratuitous arp packets for cluster addresses from the backup load balancer. Executor logging will show "HA TAKEOVER: heartbeat timeout (hatimeout exceeded)" on the backup load balancer but the highavailability status will show the backup is still in the backup state. No highavailiabity scripts will be executed on the backup load balancer when this issue occurs.
Problem conclusion
The load balancer highavailability feature sends heartbeat packets between the load balancers to monitor health. The heartbeats contain a sequence (seq) and acknowledgement (ack) value in these packets. The ack field represents the last heartbeat received from the partner. The seq field represents the senders current time value. If the ack field is older than the seq value minus the hatimeout value, a takeover is triggered which sends the gratuitous arps for the clusters and returnaddresses. These gratuitous arps inform machines to route traffic to the backup load balancer instead of the active load balancer. With this problem, the backup load balancer is preparing to send a heartbeat at the same time that another thread processes a received heartbeat from the partner LB. The timing is such that the ack value is written at the same time another thread is reading the value (for example, sender writing 12:00:00; received packet contains value 11:59:59, value actually read 11:00:00). This triggers the takeover process in the backup LB and the backup sends gratuitous arps. When the backup LB processes the next heartbeat, it obtained the correct timestamp and stops the takeover so it never completely assumes forwarding and packets forwarding for the clusters contained in the gratuitous arps will fail until the bad arp expires in the routers. The Load Balancer has added concurrency control around the heartbeat values to prevent this issue. Fix targetted for inclusion in 8.5.5.20 and 9.0.5.7
Temporary fix
If packets are being forwarded to the backup load balancer, issue the "executor configure" on the active load balancer to restore forwarding for the cluster.
Comments
APAR Information
APAR number
PH33032
Reported component name
WS EDGE LB IPV4
Reported component ID
5724H8812
Reported release
900
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-01-04
Closed date
2021-01-21
Last modified date
2021-01-21
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
WS EDGE LB IPV4
Fixed component ID
5724H8812
Applicable component levels
[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"900"}]
Document Information
Modified date:
22 January 2021