Tips: Cluster partitions
Use these tips for cluster partitions.
- The rules for restricting operations within a partition are designed to make merging the partitions feasible. Without these restrictions, reconstructing the cluster requires extensive work.
- If the nodes in the primary partition have been destroyed, special
processing may be necessary in a secondary partition. The most common scenario
that causes this condition is the loss of the site that made up the primary
partition. Use the example in recovering from partition errors and assume
that Partition 1 was destroyed. In this case, the primary node for Cluster
Resource Groups B, C, and D must be located in Partition 2. The simplest recovery
is to use Change Cluster Node Entry to set both Node A and Node B to failed.
See changing partitioned nodes to failed for more information about how to
do this. Recovery can also be achieved manually. In order to do this, perform
these operations:
- Remove Nodes A and B from the cluster in Partition 2. Partition 2 is now the cluster.
- Establish any logical replication environments needed in the new cluster. IE. Start Cluster Resource Group API/CL command, and so on.
Since nodes have been removed from the cluster definition in Partition 2, an attempt to merge Partition 1 and Partition 2 will fail. In order to correct the mismatch in cluster definitions, run the Delete Cluster (QcstDeleteCluster) API on each node in Partition 1. Then add the nodes from Partition 1 to the cluster, and reestablish all the cluster resource group definitions, recovery domains, and logical replication. This requires a great deal of work and is also prone to errors. It is very important that you do this procedure only in a site loss situation.
- Processing a start node operation is dependent on the status of
the node that is being started:
The node either failed or an End Node operation ended the node:
- Cluster resource services is started on the node that is being added
- Cluster definition is copied from an active node in the cluster to the node that is being started.
- Any cluster resource group that has the node being started in the recovery domain is copied from an active node in the cluster to the node being started. No cluster resource groups are copied from the node that is being started to an active node in the cluster.
The node is a partitioned node:
- The cluster definition of an active node is compared to the cluster definition of the node that is being started. If the definitions are the same, the start will continue as a merge operation. If the definitions do not match, the merge will stop, and the user will need to intervene.
- If the merge continues, the node that is being started is set to an active status.
- Any cluster resource group that has the node being started in the recovery domain is copied from the primary partition of the cluster resource group to the secondary partition of the cluster resource group. Cluster resource groups may be copied from the node that is being started to nodes that are already active in the cluster.
Parent topic: Partition errors