Disaster recovery scenarios for CCR

Different procedures can be followed for recovering from a broken CCR. The recovery actions to be applied vary based on the use cases.

The following list provides the use cases:
Recovering from a single quorum or non-quorum node failure
A node failure might occur when the node is completely corrupted or when the node was rebuilt from scratch. In this scenario, just one quorum node is broken but there are still enough quorum nodes available on which CCR is running without any issue. This case must be even applied when a single non-quorum node must be recovered.

Command to apply to restore the configuration information: mmsdrrestore -p <GOOD_QUORUM_NODE>.

For more information, see Recovering from a single quorum or non-quorum node failure.

Recovering from the loss of a majority of quorum nodes
In this case, a majority of quorum nodes are broken but there is still at least one quorum node available with an intact CCR state.

Command to apply to restore the configuration information: mmchnode --noquorum -N <LIST_OF_BROKEN_QUORUM_NODES> --force

For more information, see Recovering from the loss of a majority of quorum nodes.

Recovering from damage or loss of the CCR on all quorum nodes
In this case, the CCR is partially broken on all quorum nodes. This means that there are still fragments of the CCR state available on various quorum nodes but no quorum node is available with a complete, intact CCR state.

Command to apply to restore the configuration information: mmsdrrestore --ccr-repair

For more information, see Recovering from damage or loss of the CCR on all quorum nodes.

Recovering from an existing CCR backup
In this case, the CCR state on all quorum nodes is lost. That is, loss of access to /var/mmfs or /var/mmfs/ccr on all quorum nodes. This assumes that a valid CCR backup is still available.

Command to apply to restore the configuration information: mmsdrrestore -F <PATH_TO_CCR_BACKUP_FILE> -a

For more information, see Recovering from an existing CCR backup.

The following sections describe the different recovery cases in more detail with examples.