Processing cluster events
The two primary cluster events that PowerHA® SystemMirror® software handles are fallover and reintegration.
- Fallover refers to the actions taken by the PowerHA SystemMirror software when a cluster component fails or a node leaves the cluster.
- Reintegration refers to the actions that occur within the cluster when a component that had previously abandoned the cluster returns to the cluster.
Event scripts control both types of actions. During event script processing, cluster-aware application programs see the state of the cluster as unstable.
Fallover
A fallover occurs when a resource group moves from its home node to another node because its home node leaves the cluster.
Nodes leave the cluster either by a planned transition (a node shutdown or stopping cluster services on a node), or by failure. In the former case, the Cluster Manager controls the release of resources held by the exiting node and the acquisition of these resources by nodes still active in the cluster. When necessary, you can override the release and acquisition of resources (for example, to perform system maintenance). You can also postpone the acquisition of the resources by integrating nodes (by setting the delayed fallback timer for custom resource groups).
Node failure begins when a node monitoring a neighboring node ceases to receive keepalive traffic for a defined period of time. If the other cluster nodes agree that the failure is a node failure, the failing node is removed from the cluster and its resources are taken over by the active nodes configured to do so.
If other components, such as a network interface card, fail, the Cluster Manager runs an event script to switch network traffic to a backup network interface card (if present).
Reintegration
A reintegration, or a fallback occurs when a resource group moves to a node that has just joined the cluster. When a node joins a running cluster, the cluster becomes temporarily unstable. The member nodes coordinate the beginning of the join process and then run event scripts to release any resources the joining node is configured to take over. The joining node then runs an event script to take over these resources. Finally, the joining node becomes a member of the cluster. At this point, the cluster is stable again.