IBM Support

PH33032: BACKUP LB SENDS GRATUITOUS ARP FOR CLUSTERS HIGHAVAILABILITY TAKEOVER MAY OCCUR FOR NO REASON

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Using the Load Balancer with high availability configured, the
    backup load balancer may assuming forwarding role for no
    reason or send gratuitous arps which disrupts forwarded traffic.
    

Local fix

Problem summary

  • ****************************************************************
    * USERS AFFECTED:  IBM WebSphere Application Server Load       *
    *                  Balancer users with High Availity           *
    ****************************************************************
    * PROBLEM DESCRIPTION: The backup load balancer may issue      *
    *                      gratuitous ARPs for clusters            *
    *                      disrupting forwarding temporarily.      *
    ****************************************************************
    * RECOMMENDATION:                                              *
    ****************************************************************
    If this problem occurs, packets for some cluster addresses
    will be temporarily sent to the backup load balancer (which
    will not forward the packets to servers). ARP tables for
    routers will show the MAC address for the backup load balancer
    when this issue occurs. Network traces will show gratuitous
    arp packets for cluster addresses from the backup load
    balancer.
    Executor logging will show "HA TAKEOVER: heartbeat timeout
    (hatimeout exceeded)" on the backup load balancer but the
    highavailability status will show the backup is still in the
    backup state. No highavailiabity scripts will be executed on
    the backup load balancer when this issue occurs.
    

Problem conclusion

  • The load balancer highavailability feature sends heartbeat
    packets between the load balancers to monitor health. The
    heartbeats contain a sequence (seq) and acknowledgement (ack)
    value in these packets. The ack field represents the last
    heartbeat received from the partner. The seq field represents
    the senders current time value. If the ack field is older than
    the seq value minus the hatimeout value, a takeover is
    triggered which sends the gratuitous arps for the clusters and
    returnaddresses. These gratuitous arps inform machines to
    route traffic to the backup load balancer instead of the
    active load balancer.
    
    With this problem, the backup load balancer is preparing
    to send a heartbeat at the same time that another thread
    processes a received heartbeat from the partner LB. The timing
    is such that the ack value is written at the same time
    another thread is reading the value (for example, sender
    writing 12:00:00; received packet contains value 11:59:59,
    value actually read 11:00:00). This triggers the takeover
    process in the backup LB and the backup sends gratuitous
    arps. When the backup LB processes the next heartbeat,
    it obtained the correct timestamp and stops the takeover so it
    never completely assumes forwarding and packets forwarding for
    the clusters contained in the gratuitous arps will fail until
    the bad arp expires in the routers.
    
    The Load Balancer has added concurrency control around the
    heartbeat values to prevent this issue. Fix targetted for
    inclusion in 8.5.5.20 and 9.0.5.7
    

Temporary fix

  • If packets are being forwarded to the backup load
    balancer, issue the "executor configure" on the active
    load balancer to restore forwarding for the cluster.
    

Comments

APAR Information

  • APAR number

    PH33032

  • Reported component name

    WS EDGE LB IPV4

  • Reported component ID

    5724H8812

  • Reported release

    900

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-01-04

  • Closed date

    2021-01-21

  • Last modified date

    2021-01-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    WS EDGE LB IPV4

  • Fixed component ID

    5724H8812

Applicable component levels

[{"Line of Business":{"code":"LOB45","label":"Automation"},"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"900"}]

Document Information

Modified date:
22 January 2021