Repair of cluster configuration information when no CCR backup information is available: mmsdrrestore command

When no CCR backup information is available, you can repair missing or corrupted CCR files with the mmsdrrestore command and the --ccr-repair parameter.

The mmsdrrestore command with the --ccr-repair parameter can repair cluster configuration information in a cluster in which no intact CCR state can be found on the quorum nodes and no CCR backup is available from which the cluster configuration information can be recovered. In this state the CCR committed directory of all of the quorum nodes has corrupted or lost files. For more information, see mmsdrrestore command.

This procedure does not guarantee to recover the most recent state of all the configuration files in the CCR. Instead, it brings the CCR back into a consistent state with the most recent available version of each configuration file.

For an example of running the mmsdrrestore command with the --ccr-repair parameter, see mmsdrrestore command.

The following events can cause missing or corrupted files in the CCR committed directory of all of the quorum nodes:
  • All the quorum nodes crash or lose power at the same time, and on each quorum node one or more files are corrupted or lost in the CCR committed directory after the quorum nodes started up.
  • The local disks of all the quorum nodes have a power loss, and on each quorum node a file is corrupted or lost in the CCR committed directory after the local disks came back.
The following error messages are possible indicators of corrupted or lost files in the CCR committed directory. If a command displays one of these error messages, follow the instructions in the "User response" section of the error message before you try to run the mmsdrrestore --ccr-repair command:
The following error messages are indicators of other CCR problems. Consider reading and following the instructions in the "User response" sections of these error messages before you try to run the mmsdrrestore --ccr-repair command:

Checking for corrupted or lost files in the CCR committed directory

To determine whether the CCR committed directory of a quorum node has corrupted or lost files, issue the following command from the node:
mmccr check -Y -e
In the following example, the next-to-last line of the output indicates that one or more files are corrupted or lost in the CCR committed directory of the current node:
# mmccr check -Y -e
mmccr::HEADER:version:reserved:reserved:NodeId:CheckMnemonic:ErrorCode:ErrorMsg:
     ListOfFailedEntities:ListOfSucceedEntities:Severity:
mmccr::0:1:::1:CCR_CLIENT_INIT:0:::/var/mmfs/ccr,/var/mmfs/ccr/committed,/var/mmfs/ccr/ccr.nodes,
     Security,/var/mmfs/ccr/ccr.disks:OK:
mmccr::0:1:::1:FC_CCR_AUTH_KEYS:0:::/var/mmfs/ssl/authorized_ccr_keys:OK:
mmccr::0:1:::1:FC_CCR_PAXOS_CACHED:0:::/var/mmfs/ccr/cached,/var/mmfs/ccr/cached/ccr.paxos:OK:
mmccr::0:1:::1:FC_CCR_PAXOS_12:0:::/var/mmfs/ccr/ccr.paxos.1,/var/mmfs/ccr/ccr.paxos.2:OK:
mmccr::0:1:::1:PC_LOCAL_SERVER:0:::c80f5m5n01.gpfs.net:OK:
mmccr::0:1:::1:PC_IP_ADDR_LOOKUP:0:::c80f5m5n01.gpfs.net,0.000:OK:
mmccr::0:1:::1:PC_QUORUM_NODES:0:::192.168.80.181,192.168.80.182:OK:
mmccr::0:1:::1:FC_COMMITTED_DIR:5:Files in committed directory missing or corrupted:1:6:WARNING:
mmccr::0:1:::1:TC_TIEBREAKER_DISKS:0:::1:OK:
In the following example, the next-to-last line indicates that none of the files in the CCR committed directory of the current node are corrupted or lost:
# mmccr check -Y -e
mmccr::HEADER:version:reserved:reserved:NodeId:CheckMnemonic:ErrorCode:ErrorMsg:
     ListOfFailedEntities:ListOfSucceedEntities:Severity:
mmccr::0:1:::1:CCR_CLIENT_INIT:0:::/var/mmfs/ccr,/var/mmfs/ccr/committed,/var/mmfs/ccr/ccr.nodes,
     Security,/var/mmfs/ccr/ccr.disks:OK:
mmccr::0:1:::1:FC_CCR_AUTH_KEYS:0:::/var/mmfs/ssl/authorized_ccr_keys:OK:
mmccr::0:1:::1:FC_CCR_PAXOS_CACHED:0:::/var/mmfs/ccr/cached,/var/mmfs/ccr/cached/ccr.paxos:OK:
mmccr::0:1:::1:FC_CCR_PAXOS_12:0:::/var/mmfs/ccr/ccr.paxos.1,/var/mmfs/ccr/ccr.paxos.2:OK:
mmccr::0:1:::1:PC_LOCAL_SERVER:0:::c80f5m5n01.gpfs.net:OK:
mmccr::0:1:::1:PC_IP_ADDR_LOOKUP:0:::c80f5m5n01.gpfs.net,0.000:OK:
mmccr::0:1:::1:PC_QUORUM_NODES:0:::192.168.80.181,192.168.80.182:OK:
mmccr::0:1:::1:FC_COMMITTED_DIR:0::0:7:OK:
mmccr::0:1:::1:TC_TIEBREAKER_DISKS:0:::1:OK: