Cluster failure management
If a cluster member fails, you must take different administrative actions, depending on the role of the node in the cluster.
- Failure of the primary master
- Promote a different node to the primary master.
For detailed steps that describe how to promote a different node,
see Promoting a node to master.
You can promote a non-master node to the primary master so that other master nodes in the environment remain for failover purposes.
If there is a secondary master in the environment, you can optionally promote it to primary master. The process for this promotion depends on whether there are tertiary and quaternary masters in the environment:- If there are tertiary and quaternary masters, you must take either
of the following actions at the same time as you promote the secondary
master to primary:
- Promote a non-master node to secondary master, or
- Demote the tertiary and quaternary nodes to non-master nodes.
- If you do not have tertiary and quaternary masters, you can promote the secondary master to primary master and the cluster can operate with a single master. However, for high availability purposes, you might also want to promote a non-master node to secondary master.
- If there are tertiary and quaternary masters, you must take either
of the following actions at the same time as you promote the secondary
master to primary:
- Remove the failed node from the cluster. For detailed steps, see Removing an unreachable master node from the cluster.
- Export the signature file from the new master. You must use this signature file when you are adding new nodes to the cluster.
- Promote a different node to the primary master.
For detailed steps that describe how to promote a different node,
see Promoting a node to master.
- Failure of a secondary, tertiary, or quaternary master
- Demote the failed node on the primary master.
- Promote a non-master node to replace the failed master.Note: You might need to complete steps 1 and 2 simultaneously to ensure that you maintain a valid combination of master nodes. For more information about valid architectures, see Cluster architecture rules.
- Remove the failed node from the cluster.
- Failure of a node
- Unregister the node on the primary master.
- Optionally, you can add a node to the cluster to replace the failed node.