Cluster failure management

If a cluster member fails, you must take different administrative actions, depending on the role of the node in the cluster.

Failure of the primary master
  1. Promote a different node to the primary master. For detailed steps that describe how to promote a different node, see Promoting a node to master.

    You can promote a non-master node to the primary master so that other master nodes in the environment remain for failover purposes.

    If there is a secondary master in the environment, you can optionally promote it to primary master. The process for this promotion depends on whether there are tertiary and quaternary masters in the environment:
    • If there are tertiary and quaternary masters, you must take either of the following actions at the same time as you promote the secondary master to primary:
      • Promote a non-master node to secondary master, or
      • Demote the tertiary and quaternary nodes to non-master nodes.
      You cannot have a tertiary and quaternary master without a secondary master.
    • If you do not have tertiary and quaternary masters, you can promote the secondary master to primary master and the cluster can operate with a single master. However, for high availability purposes, you might also want to promote a non-master node to secondary master.
  2. Remove the failed node from the cluster. For detailed steps, see Removing an unreachable master node from the cluster.
  3. Export the signature file from the new master. You must use this signature file when you are adding new nodes to the cluster.
Failure of a secondary, tertiary, or quaternary master
  1. Demote the failed node on the primary master.
  2. Promote a non-master node to replace the failed master.
    Note: You might need to complete steps 1 and 2 simultaneously to ensure that you maintain a valid combination of master nodes. For more information about valid architectures, see Cluster architecture rules.
  3. Remove the failed node from the cluster.
Failure of a node
  1. Unregister the node on the primary master.
  2. Optionally, you can add a node to the cluster to replace the failed node.