node_down events
When all network interfaces are down, or a node does not respond to heartbeats, the cluster managers then run a node_down event. Depending on the cluster configuration, the peer nodes then take the necessary actions to get critical applications up and running and to ensure that data remains available.
A node_down event can be initiated by a node:
- Stopping cluster services and bringing resource groups offline
- Stopping cluster services and moving resource groups to another node
- Stopping cluster services and placing resource groups in an unmanaged state.
- Failing.
Stopping cluster services and bringing resource groups offline
When you stop cluster services and bring resource groups offline, PowerHA® SystemMirror® stops on the local node after the node_down_complete event releases the stopped node's resources. The other nodes run the node_down_complete event and do not take over the resources of the stopped node.
Stopping cluster services and moving resource groups
When you stop cluster services and move the resource groups to another node, PowerHA SystemMirror stops after the node_down_complete event on the local node releases its resource groups. The surviving nodes in the resource group node list take over these resource groups.
Stopping cluster services and placing resource groups in an unmanaged state
When you stop cluster services and place resource groups in an unmanaged state, PowerHA SystemMirror software stops immediately on the local node. The node_down_complete event is run on the stopped node. The cluster managers on remote nodes process node_down events, but do not take over any resource groups. The stopped node does not release its resource groups.
Node failure
When a node fails, the cluster manager on that node does not have time to generate a node_down event. In this case, the cluster managers on the surviving nodes recognize that a node_down event has occurred (when they realize the failed node is no longer communicating), and they trigger node_down events.
This initiates a series of sub events that reconfigure the cluster to deal with that failed node. Based on the cluster configuration, surviving nodes in the resource group node list take over the resource groups.
Sequence of node_down events
The following list describes the default parallel sequence of node_down events:
- node_down
- This event occurs when a node intentionally leaves the cluster or fails.
- In some cases, the node_down event receives the forced parameter.
- All nodes run the node_down event.
- All nodes run the node_down event.
- All nodes run the process_resources script. After the cluster manager evaluates the status of affected resource groups and the configuration, it initiates a series of sub events to redistribute resources as configured for fallover or fallback.
- All nodes run the process_resources_complete script.
- node_down_complete