IBM Support

Tips on configuring the high availability feature for Load Balancer

Troubleshooting


Problem

With the high availability function for Load Balancer, packet distribution takeovers occur if the primary partner fails or is shut down. To maintain connections between the high availability partners, connection records are passed between the two machines. When the backup partner assumes packet forwarding function, the cluster address is removed from the backup machine and added to the new primary machine. There are numerous timing and configuration considerations that can affect this takeover operation.

Symptom

The tips listed in this technote help alleviate problems that arise from high availability configuration problems such as:
  • Connections dropped after takeover
     
  • Partner machines are unable to synchronize
     
  • Requests erroneously directed to the backup partner machine

Resolving The Problem

The following five tips are helpful for successful configuration of high availability on your Load Balancer machines.
  1. The positioning of the high availability commands in your script files can make a significant difference.  The configuration is saved in the recommended fashion when you save the configuration by using the load balancer "save" function.  It is best practice to allow the load balancer to create the configuration file after updates to the configuration.

    Examples of high availability commands are:

    dscontrol highavailability heartbeat add ...
    dscontrol highavailability backup add ...
    dscontrol highavailability reach add ...


    In most cases, you must position the high availability definitions at the end of the file. The cluster, port, and server statements must be placed before the high availability statements.

    There are several issues that can occur if the high availability statements are not placed at the end of the configuration. If high availability synchronization occurs, the load balancer looks for the cluster, port, and server to process the replication record. If the cluster, port, and server do not exist, the connection record is dropped. If a takeover occurs and the connection record was not replicated to the partner machine, the connection fails. For the "Load Balancer for IPv4 and IPv6" where go scripts are not required, if a takeover occurs before the configuration is complete, a gratuitous arp might not be sent for all necessary addresses. Forwarding failures will occur because the routers will direct traffic to the partner load balancer, which would be in standby mode after the takeover.
  2. The order differs when collocated servers and MAC-forwarding method is configured. In this case, the high availability statements must come before the collocated server statements. If the high availability statements are not before the collocated server statements, Load Balancer receives a request for the collocated server and attempts to load balance to a new server. A cycle is created attempting to forward the same packet on the network. When the high availability statements are placed before the collocated server, traffic from the standby Load Balancer is not distributed.

    Steps 2, 4 and 5 do not apply to the Load Balancer for IPv4 and IPv6
  3. (Applies to the Load Balancer for IPv4 only) On z/OS® or OS/390® operating systems, the hypervisor controls the interface and multiplexes the real interface among the guest operating systems. The hypervisor permits only one guest at a time to register itself for an address, and there is an update window. When the cluster address is removed from the backup machine, a delay can be necessary before the cluster address is configured on the primary machine; otherwise, traffic continues to be sent to the wrong Load Balancer.

    To correct this behavior, add a sleep delay in the goActive script. The amount of time needed to sleep is deployment-dependent. It is recommended that you start with a sleep delay time of 10.
     
  4. High availability partners must be able to communicate with each other and must be on the same subnet.

    By default, the machines attempt to communicate with each other every one-half second and will detect a failure after two seconds with no communication received. If the Load Balancer is busy forwarding traffic, failovers occur if the Load Balancer cannot answer the heartbeat within the timeout period. You can increase the number of times until failure by issuing:

    dscontrol executor set hatimeout new_timeout_value 

    The executor must be started for this command to be successful.
     
  5. (Applies to the Load Balancer for IPv4 only) When the partners synchronize, all the connection records are sent from the active machine to the backup machine. The synchronization must complete within the default limit of 50 seconds.

    To accomplish this, old connections must not remain in memory for an extended amount of time. In particular, LDAP ports typically have large staletimeout periods (in excess of one day). Setting a large staletimeout period causes old connections to remain in memory, which causes more connection records to be passed at synchronization, and also more memory usage on both machines.

    If the synchronization fails with a reasonable staletimeout period, you can increase the synchronization timeout by issuing:

    e xm 33 5 new_timeout  

    The timeout value is stored in one-half second; therefore, the default value for new_timeout is 100 (50 seconds).
     
  6. (Applies to the Load Balancer for IPv4 only) When a partner machine takes over the workload, it issues a gratuitous ARP. The gratuitous ARP inform any machine on the same subnet of the new hardware address associated with the cluster address. You must ensure that your routers honor gratuitous ARPs and update their cache, or the requests are sent to the inactive partner.-

[{"Product":{"code":"SSEQTP","label":"WebSphere Application Server"},"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Component":"Edge Component","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.0;8.5;8.0;7.0","Edition":"Network Deployment"},{"Business Unit":{"code":"BU004","label":"Hybrid Cloud"},"Product":{"code":"SSNVBF","label":"Runtimes for Java Technology"},"Component":"Java™ SDK","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions","Edition":""}]

Document Information

Modified date:
05 February 2020

UID

swg21211427