IBM Support

Troubleshooting: Standby Control not working correctly on DataPower Appliance

Troubleshooting


Problem

Ethernet interface IP and/or Virtual IP address (VIP) become unresponsive with Standby Control enabled.

Cause

This problem may occur if the ethernet interfaces within the standby group are unable to communicate with each other successfully after a standby control group takeover.

Resolving The Problem


The default standby control takeover method is "IP layer takeover" which uses gratuitous ARP.
Note, the default method is preferred because it does not require that the network support rapid spanning tree, and takeovers can occur more quickly.

A. If the ethernet IP and/or VIP becomes unresponsive with the "IP layer takeover" method (i.e Standby Control MAC Takeover = OFF) verify the following recommendations are followed:
1 - All network devices on the LAN, including the router, must permit and honor gratuitous ARP in order for IP layer takeover to be effective.

2 - Configure the standby group number with a value higher than 20. This will help avoid the accidental use of duplicate group numbers within a single network as it is very common for network routers and switches to be configured with the lower standby group numbers.

3 - Ensure that the Priority set on each appliance is unique.



B. If the ethernet IP and/or VIP becomes unresponsive with the "MAC takeover" method (i.e Standby Control MAC Takeover = ON) verify the following recommendations are followed:
 
1 - Ensure that Rapid Spanning Tree is enabled on all switches that are connected to the DataPower appliances.

This should also be enabled on any other switches within the network path of the appliances. This will place the switch ports into a forwarding state that will allow traffic to be passed through immediately whenever the VIP moves to the currently active device.

Refer to your switch operation manual for configuring this setting.

2 - Make sure the network can support multiple MAC addresses per Ethernet port for takeover to be effective.

3 - Configure the Standby group number with a value higher than 20. This will help avoid the accidental use of duplicate group numbers within a single network as it is very common for network routers and switches to be configured with the lower standby group numbers.

4 - Ensure that the Priority set on each interface is unique. It is not recommended to have multiple interfaces set to the same Priority value.

5 - Preempt mode should be set to OFF. This is recommended to simplify the configuration and to help with debugging in the event that there are unexpected network delays which could cause the interfaces to start switching between the active and standby states.

6 - Make sure that Rapid Spanning Tree is properly deployed and that the convergence time on the switch is less than the "hold" time configured on the appliance. The default "hold" time on the appliance is 10 seconds. This should help allow the proper amount of time to complete a fail-over and help prevent unnecessary fail-overs caused by misconfiguration.

To verify Rapid Spanning Tree is properly deployed, review How to verify Rapid Spanning Tree is deployed properly for DataPower Standby Control.

If you are able to determine that Rapid Spanning Tree is properly deployed and that the convergence time requires longer than 10 seconds, you can optionally change the "hello" and "hold" time intervals
on the appliance to accommodate the switch, by using the 'standby' CLI command.

Note: We do not typically recommend changing the intervals on the appliance as we have seen very rare instances that required the timer values to be larger than the default setting. Also, keep in mind that larger timers would mean longer periods to detect when a failover should occur.

For example, issue the following from the CLI:

config                                                                        
int eth0                                        
standby 100 timers 5 15

In the example above, 100 is the group number, 5 is the hello time, and 15 is the hold time.


Diagnosing the Problem
If the problem persists after applying the configuration tips above, collect the following data as indicated in MustGather: Collecting data for Standby Control issues on DataPower Appliance.

[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SS9H2Y","label":"IBM DataPower Gateway"},"Component":"General","Platform":[{"code":"PF009","label":"Firmware"}],"Version":"2018.4.1","Edition":"Edition Independent","Line of Business":{"code":"LOB45","label":"Automation"}}]

Document Information

Modified date:
08 June 2021

UID

swg21420179