Selective fallover caused by network interface failures
When a network interface with a PowerHA® SystemMirror® service IP label fails and there are no other network interfaces available on the node on the same PowerHA SystemMirror network, the affected applications on that node cannot run. If the service network interface is the last one available on the node, the network interface failure triggers a network failure event.
PowerHA SystemMirror distinguishes between two types of network failure, local and global. A local network failure occurs when a node can no longer communicate over a particular network, but the network is still in use by other nodes. A global network failure occurs when all nodes lose the ability to communicate over a network.
PowerHA SystemMirror uses the following formats for local and global network failure events:
- Local Network Failure Event
network_down <node_name> <network_name>- Global Network Failure Event
network_down -1 <network_name>
In the case of a local network failure, you may create a post-event to trigger a node_down event. While this has the desired effect of moving the resource group with the failed resource to another node, it has the undesired effect of moving all of the resource groups on the node to other nodes.
Selective fallover uses this infrastructure to better handle network interface failures. You do not have to create a post-event to promote a local network failure to a node failure in this case. See the section below for more information on how PowerHA SystemMirror handles network interface failures.
You should not promote global network failures to node_down events as the global network event applies to all nodes and would result in a node down for all nodes.
Actions taken for network interface failures
PowerHA SystemMirror takes the following actions in cases of network interface failures:
- When a network interface with a service IP label fails, and there are no network interfaces available on the same node (therefore, a swap_adapter is not possible), it moves only the resource group associated with the failed service network interface to another node.
- When a network interface fails and this can result in launching an rg_move for the affected resource group, a check for available network interfaces is made. The highest priority node with an available network interface attempts to acquire the resource group.
- PowerHA SystemMirror checks that a network interface is available on the node joining the cluster before releasing the resource group. If no network interfaces are available, the resource group is not released.
The above actions assume available nodes in the resource group definitions.
The hacmp.out file contains messages informing you about cluster activity that results from selective fallover actions.