Recovering from damage or loss of the CCR on all quorum nodes

This scenario occurs when files get corrupted in the CCR directory on all quorum nodes in the cluster.

Perform the following steps to investigate the issue and recover CCR on all quorum nodes:
  1. Check the CCR status as shown in the following example:
    # mmccr check -Ye
    mmccr::HEADER:version:reserved:reserved:NodeId:CheckMnemonic:ErrorCode:ErrorMsg:ListOfFailedEntities:ListOfSucceedEntities:Severity:
    mmccr::0:1:::1:CCR_CLIENT_INIT:0:::/var/mmfs/ccr,/var/mmfs/ccr/committed,/var/mmfs/ccr/ccr.nodes,Security:OK:
    mmccr::0:1:::1:FC_CCR_AUTH_KEYS:0:::/var/mmfs/ssl/authorized_ccr_keys:OK:
    mmccr::0:1:::1:FC_CCR_PAXOS_CACHED:0:::/var/mmfs/ccr/cached,/var/mmfs/ccr/cached/ccr.paxos:OK:
    mmccr::0:1:::1:FC_CCR_PAXOS_12:0:::/var/mmfs/ccr/ccr.paxos.1,/var/mmfs/ccr/ccr.paxos.2:OK:
    mmccr::0:1:::1:PC_LOCAL_SERVER:0:::node-21.localnet.com:OK:
    mmccr::0:1:::1:PC_IP_ADDR_LOOKUP:0:::node-21.localnet.com,0.000:OK:
    mmccr::0:1:::1:PC_QUORUM_NODES:0:::10.0.100.21,10.0.100.22,10.0.100.23:OK:
    mmccr::0:1:::1:FC_COMMITTED_DIR:5:Files in committed directory missing or corrupted:1:6:WARNING:
    mmccr::0:1:::1:TC_TIEBREAKER_DISKS:0::::OK:
    
  2. Issue the mmsdrrestore command with the --ccr-repair option to repair CCR. A sample output is as follows:
    # mmsdrrestore --ccr-repair
    mmsdrrestore: Checking CCR on all quorum nodes ...
    mmsdrrestore: Invoking CCR restore in dry run mode ...
    
    ccrrestore: +++ DRY RUN: CCR state on quorum nodes will not be restored +++
    ccrrestore:  1/8: Test tool chain successful
    ccrrestore:  2/8: Setup local working directories successful
    ccrrestore:  3/8: Copy Paxos state files from quorum nodes successful
    ccrrestore:  4/8: Getting most recent Paxos state file successful
    ccrrestore:  5/8: Get cksum of files in committed directory successful
    ccrrestore:  6/8: WARNING: Intact ccr.nodes file with version 5 missing in committed directory
    ccrrestore:  6/8: INFORMATION: Intact ccr.disks found (file id: 2 version: 1)
    ccrrestore:  6/8: INFORMATION: Intact mmLockFileDB found (file id: 3 version: 1)
    ccrrestore:  6/8: INFORMATION: Intact genKeyData found (file id: 4 version: 1)
    ccrrestore:  6/8: INFORMATION: Intact genKeyDataNew found (file id: 5 version: 2)
    ccrrestore:  6/8: INFORMATION: Intact mmsdrfs found (file id: 6 version: 23)
    ccrrestore:  6/8: INFORMATION: Intact mmsysmon.json found (file id: 7 version: 1)
    ccrrestore:  6/8: Parsing committed file list successful
    ccrrestore:  7/8: Pulling committed files from quorum nodes successful
    ccrrestore:  8/8: File name: 'ccr.nodes' file state: UPDATED remark: 'OLD (v5, ((n1,e6),103), f20ea9e3)'
    ccrrestore:  8/8: File name: 'ccr.disks' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: File name: 'mmLockFileDB' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: File name: 'genKeyData' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: File name: 'genKeyDataNew' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: File name: 'mmsdrfs' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: File name: 'mmsysmon.json' file state: MATCHING remark: 'none'
    ccrrestore:  8/8: Patching Paxos state successful
    
    mmsdrrestore: Review the dry run report above to see what will be changed and decide if you want to continue the restore or not.  Do you want to continue? (yes/no) yes
    ccrrestore:  1/14: Test tool chain successful
    ccrrestore:  2/14: Test GPFS shutdown successful
    ccrrestore:  3/14: Setup local working directories successful
    ccrrestore:  4/14: Archiving CCR directories on quorum nodes successful
    ccrrestore:  5/14: Kill GPFS mmsdrserv daemon successful
    ccrrestore:  6/14: Copy Paxos state files from quorum nodes successful
    ccrrestore:  7/14: Getting most recent Paxos state file successful
    ccrrestore:  8/14: Get cksum of files in committed directory successful
    ccrrestore:  9/14: WARNING: Intact ccr.nodes file with version 5 missing in committed directory
    ccrrestore:  9/14: INFORMATION: Intact ccr.disks found (file id: 2 version: 1)
    ccrrestore:  9/14: INFORMATION: Intact mmLockFileDB found (file id: 3 version: 1)
    ccrrestore:  9/14: INFORMATION: Intact genKeyData found (file id: 4 version: 1)
    ccrrestore:  9/14: INFORMATION: Intact genKeyDataNew found (file id: 5 version: 2)
    ccrrestore:  9/14: INFORMATION: Intact mmsdrfs found (file id: 6 version: 23)
    ccrrestore:  9/14: INFORMATION: Intact mmsysmon.json found (file id: 7 version: 1)
    ccrrestore:  9/14: Parsing committed file list successful
    ccrrestore: 10/14: Pulling committed files from quorum nodes successful
    ccrrestore: 11/14: File name: 'ccr.nodes' file state: UPDATED remark: 'OLD (v5, ((n1,e6),103), f20ea9e3)'
    ccrrestore: 11/14: File name: 'ccr.disks' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: File name: 'mmLockFileDB' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: File name: 'genKeyData' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: File name: 'genKeyDataNew' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: File name: 'mmsdrfs' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: File name: 'mmsysmon.json' file state: MATCHING remark: 'none'
    ccrrestore: 11/14: Patching Paxos state successful
    ccrrestore: 12/14: Pushing CCR files successful
    ccrrestore: 13/14: Started GPFS mmsdrserv daemon successful
    ccrrestore: 14/14: Ping GPFS mmsdrserv daemon successful
    
  3. Issue the mmccr check command as shown in the following example to check the status of the CCR:
    # mmccr check -Ye
    mmccr::HEADER:version:reserved:reserved:NodeId:CheckMnemonic:ErrorCode:ErrorMsg:ListOfFailedEntities:ListOfSucceedEntities:Severity:
    mmccr::0:1:::1:CCR_CLIENT_INIT:0:::/var/mmfs/ccr,/var/mmfs/ccr/committed,/var/mmfs/ccr/ccr.nodes,Security:OK:
    mmccr::0:1:::1:FC_CCR_AUTH_KEYS:0:::/var/mmfs/ssl/authorized_ccr_keys:OK:
    mmccr::0:1:::1:FC_CCR_PAXOS_CACHED:0:::/var/mmfs/ccr/cached,/var/mmfs/ccr/cached/ccr.paxos:OK:
    mmccr::0:1:::1:FC_CCR_PAXOS_12:0:::/var/mmfs/ccr/ccr.paxos.1,/var/mmfs/ccr/ccr.paxos.2:OK:
    mmccr::0:1:::1:PC_LOCAL_SERVER:0:::node-21.localnet.com:OK:
    mmccr::0:1:::1:PC_IP_ADDR_LOOKUP:0:::node-21.localnet.com,0.000:OK:
    mmccr::0:1:::1:PC_QUORUM_NODES:0:::10.0.100.21,10.0.100.22,10.0.100.23:OK:
    mmccr::0:1:::1:FC_COMMITTED_DIR:0::0:7:OK:
    mmccr::0:1:::1:TC_TIEBREAKER_DISKS:0::::OK:
    
Important: The CCR restore script recovers the CCR from the fragments of CCR configuration files that are available in the cluster nodes. The recovered CCR might be having the details of an old cluster configuration. If a recent backup is available, it might be better to use that backup, even if mmsdrrestore --ccr-repair is able to restore from available fragments.