mmsdrrestore command

Restores the latest GPFS system files on the specified nodes.

Synopsis

mmsdrrestore [-p NodeName] [-F mmsdrfsFile] [-R remoteFileCopyCommand]
             [-a | -N {Node[,Node...] | NodeFile | NodeClass}]
or
mmsdrrestore --ccr-repair [-p NodeName] [-F mmsdrfsFile] [-R remoteFileCopyCommand]

Availability

Available on all IBM Storage Scale editions.

Description

The mmsdrrestore command is intended for use by experienced system administrators.

The mmsdrrestore command restores the latest GPFS system files on the specified nodes. If no nodes are specified, the command restores the configuration information only on the node on which the command is issued. If the local GPFS configuration file is missing, the file that is specified with the -F option from the node that is specified with the -p option is used instead.

This command works best when the -F option specifies a backup file that is created by the mmsdrbackup user exit. If the Cluster Configuration Repository (CCR) is enabled, the mmsdrbackup user exit creates a CCR backup file. If the CCR is not enabled, the user exit creates an mmsdrfs backup file. For more information, see mmsdrbackup user exit.

The mmsdrrestore command cannot restore a cluster configuration unless a majority of the quorum nodes in the cluster are accessible. However, this requirement does not apply if the -F option specifies a CCR backup file or if the --ccr-repair option is specified.

Parameters

-p NodeName
Specifies the node from which to obtain a valid GPFS configuration file. The node must be either the primary configuration server or a node that has a valid backup copy of the mmsdrfs file. If this parameter is not specified, the command uses the configuration file on the node from which the command is issued.
-F mmsdrfsFile
Specifies the path name of the GPFS configuration file for the mmsdrrestore command to use. This configuration file might be the current one on the primary server, or it might be a configuration file that is obtained from the mmsdrbackup user exit. If not specified, /var/mmfs/gen/mmsdrfs is used.

If the configuration file is a CCR backup file, you can restore the entire cluster by using the -a option of the mmsdrrestore command. To restore only the local node or the node that is issuing the command, you do not require the -a option. If the command uses the CCR backup file as the configuration file for restoring nodes then the -N option is not supported.

If the configuration file is an mmsdrfs file, you can specify the nodes to be restored with the -N option or you can issue the command from a node that needs to be restored.

-R remoteFileCopyCommand
Specifies the fully qualified path name for the remote file copy program to be used for obtaining the GPFS configuration file. The default is /usr/bin/rcp.
-a
Restores the GPFS configuration files on all nodes in the cluster.
-N {Node[,Node...] | NodeFile | NodeClass}
Restores the GPFS configuration files on a set of nodes.

For general information on how to specify node names, see Specifying nodes as input to GPFS commands.

This option does not support a NodeClass of mount.

In IBM Storage Scale 5.0.4 and later, the -N option is valid not only when the cluster is using server-based backup but also when the cluster has the CCR enabled.

--ccr-repair
Repairs corrupted or lost files in the CCR committed directory of all of the quorum nodes. Use this option only when the CCR committed directories of all of the quorum nodes have corrupted or lost files and other restore methods have failed. For more information, see Example 4 and Repair of cluster configuration information when no CCR backup information is available: mmsdrrestore command.
Note: When you issue the mmsdrrestore command with the --ccr-repair option, you must shut down the GPFS daemon on all the quorum nodes before you run the command.
In IBM Storage Scale 5.0.4 and later you can use the --ccr-repair option in the following environments:
  • On nodes that are running Linux®, AIX®, or Microsoft Windows
  • In a sudo-wrapper environment on Linux or AIX
The --ccr-repair option requires a version of Python to be installed on the node:
  • If the node is running IBM Storage Scale 5.1.0 or later, Python 3 or later must be installed on the node:
    • On Linux, Python 3 is usually installed automatically.
    • Python 3 is shipped with AIX 7.3.

      On AIX 7.2, you must manually install Python 3.0 or later. For more information, see AIX Toolbox for Linux Applications. To manually install, one of the ways is to configure Yum with the AIX toolbox. For more information, see Configuring YUM and creating local repositories on IBM AIX.

    • On Windows, manually install Python 3 under the Cygwin environment as described in the following steps. Windows native (non-Cygwin) distributions of Python 3 are not supported.
      1. From http://www.cygwin.com, download and run the Cygwin 64-bit setup program setup-x86_64.exe.
      2. In the "Select Packages" window, click View > Category.
      3. Click All > Python > Python3 and select the latest level.
      4. Follow the instructions to complete the installation.

  • If the node is running IBM Storage Scale 5.0.5.x, Python 2 must be installed on the node:
    • On Linux, Python 2 is usually installed automatically.
    • On AIX, manually install Python 2.7.5 or a later version of Python 2. For more information, see AIX Toolbox for Linux Applications.
    • On Windows, manually install Python 2 under the Cygwin environment as described in the following steps. Windows native (non-Cygwin) distributions of Python 2 are not supported.
      1. From http://www.cygwin.com, download and run the Cygwin 64-bit setup program setup-x86_64.exe.
      2. In the "Select Packages" window, click View > Category.
      3. Click All > Python > Python2 and select the latest level.
      4. Follow the instructions to complete the installation.

Important: The use of the --ccr-repair option does not guarantee the recovery of the most recent state of all the configuration files in the CCR. Instead, the option brings the CCR back into a consistent state with the most recent available version of each configuration file.

Exit status

0
Successful completion.
nonzero
A failure occurred.

Security

You must have root authority to run the mmsdrrestore command.

The node on which the command is issued must be able to run remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

  1. To restore the latest GPFS system files on the local node using the GPFS configuration file /var/mmfs/gen/mmsdrfs from the node that is named primaryServer, issue the following command:
    # mmsdrrestore -p primaryServer 
    A sample output is as follows:
    Tue Jul  3 18:19:53 CDT 2012: mmsdrrestore: Processing node  k164n04.kgn.ibm.com
    mmsdrrestore:  Node k164n04.kgn.ibm.com successfully restored.
  2. To restore the GPFS system files on all nodes in the cluster using GPFS configuration file /GPFSconfigFiles/mmsdrfs.120605 on the node that is named GPFSarchive, issue the following command from the node named localNode:
    # mmsdrrestore -p GPFSarchive -F /GPFSconfigFiles/mmsdrfs.120605 -a
    A sample output is as follows:
    Tue Jul  3 18:29:28 CDT 2012: mmsdrrestore: Processing node k164n04.kgn.ibm.com
    Tue Jul  3 18:29:30 CDT 2012: mmsdrrestore: Processing node k164n05.kgn.ibm.com
    Tue Jul  3 18:29:31 CDT 2012: mmsdrrestore: Processing node k164n06.kgn.ibm.com
    mmsdrrestore: Command successfully completed
  3. The following command restores the GPFS system files from a CCR backup file. The -a option is required. The command restores any nodes in the cluster that need to be restored:
    # mmsdrrestore -F /GPFSbackupFiles/CCRBackup.2015.10.14.10.01.25.tar.gz -a
    A sample output is as follows:
    Restoring CCR backup
    CCR backup has been restored
  4. The following example shows how the mmsdrrestore command can be run with the --ccr-repair option to repair missing or corrupted files in the CCR committed directory of all of the quorum nodes.
    1. Issue a command like the following one to shut down the GPFS daemon on all of the quorum nodes:
      # mmshutdown -a
      Wed Mar 13 01:57:39 EDT 2019: mmshutdown: Starting force unmount of GPFS file systems
      Wed Mar 13 01:57:44 EDT 2019: mmshutdown: Shutting down GPFS daemons
      Wed Mar 13 01:58:07 EDT 2019: mmshutdown: Finished
      
    2. Issue the following command to repair the missing or corrupted CCR files:
      # mmsdrrestore --ccr-repair
      mmsdrrestore: Checking CCR on all quorum nodes ...
      mmsdrrestore. Invoking CCR restore in dry run mode ...
      
      ccrrestore: +++ DRY RUN: CCR state on quorum nodes and tiebreaker disks will not be restored +++
      ccrrestore:  1/10: Test tool chain successful
      ccrrestore:  2/10: Setup local working directories successful
      ccrrestore:  3/10: Read CCR Paxos state from tiebreaker disks successful
      ccrrestore:  4/10: Copy Paxos state files from quorum nodes successful
      ccrrestore:  5/10: Getting most recent Paxos state file successful
      ccrrestore:  6/10: Get cksum of files in committed directory successful
      ccrrestore:  7/10: WARNING: Intact ccr.nodes file missing in committed directory
      ccrrestore:  7/10: INFORMATION: Intact mmsysmon.json found (file id: 3 version: 1)
      ccrrestore:  7/10: INFORMATION: Intact mmsdrfs found (file id: 4 version: 901)
      ccrrestore:  7/10: INFORMATION: Intact mmLockFileDB found (file id: 5 version: 1)
      ccrrestore:  7/10: INFORMATION: Intact genKeyData found (file id: 6 version: 1)
      ccrrestore:  7/10: INFORMATION: Intact genKeyDataNew found (file id: 7 version: 1)
      ccrrestore:  7/10: Parsing committed file list successful
      ccrrestore:  8/10: Get cksum of CCR files successful
      ccrrestore:  9/10: Pulling committed files from quorum nodes successful
      ccrrestore: 10/10: File name: 'ccr.nodes' file state: UPDATED remark: 'OLD (v1, ((n1,e1), 0),
           9a5d4266)'
      ccrrestore: 10/10: File name: 'ccr.disks' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: File name: 'mmsysmon.json' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: File name: 'mmsdrfs' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: File name: 'mmLockFileDB' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: File name: 'genKeyData' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: File name: 'genKeyDataNew' file state: MATCHING remark: 'none'
      ccrrestore: 10/10: Patching Paxos state successful
      
      mmsdrrestore: Review the dry run report above to see what will be changed and decide if you 
           want to continue the restore or not.  Do you want to continue? (yes/no) yes
      ccrrestore:  1/17: Test tool chain successful
      ccrrestore:  2/17: Test GPFS shutdown successful
      ccrrestore:  3/17: Setup local working directories successful
      ccrrestore:  4/17: Archiving CCR directories on quorum nodes successful
      ccrrestore:  5/17: Read CCR Paxos state from tiebreaker disks successful
      ccrrestore:  6/17: Kill GPFS mmsdrserv daemon successful
      ccrrestore:  7/17: Copy Paxos state files from quorum nodes successful
      ccrrestore:  8/17: Getting most recent Paxos state file successful
      ccrrestore:  9/17: Get cksum of files in committed directory successful
      ccrrestore: 10/17: WARNING: Intact ccr.nodes file missing in committed directory
      ccrrestore: 10/17: INFORMATION: Intact mmsysmon.json found (file id: 3 version: 1)
      ccrrestore: 10/17: INFORMATION: Intact mmsdrfs found (file id: 4 version: 901)
      ccrrestore: 10/17: INFORMATION: Intact mmLockFileDB found (file id: 5 version: 1)
      ccrrestore: 10/17: INFORMATION: Intact genKeyData found (file id: 6 version: 1)
      ccrrestore: 10/17: INFORMATION: Intact genKeyDataNew found (file id: 7 version: 1)
      ccrrestore: 10/17: Parsing committed file list successful
      ccrrestore: 11/17: Get cksum of CCR files successful
      ccrrestore: 12/17: Pulling committed files from quorum nodes successful
      ccrrestore: 13/17: File name: 'ccr.nodes' file state: UPDATED remark: 'OLD (v1, ((n1,e1), 0),
           9a5d4266)'
      ccrrestore: 13/17: File name: 'ccr.disks' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: File name: 'mmsysmon.json' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: File name: 'mmsdrfs' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: File name: 'mmLockFileDB' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: File name: 'genKeyData' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: File name: 'genKeyDataNew' file state: MATCHING remark: 'none'
      ccrrestore: 13/17: Patching Paxos state successful
      ccrrestore: 14/17: Pushing CCR files successful
      ccrrestore: 15/17: Started GPFS mmsdrserv daemon successful
      ccrrestore: 16/17: Ping GPFS mmsdrserv daemon successful
      ccrrestore: 17/17: Write CCR Paxos state to tiebreaker disks 
      
    3. Issue a command like the following one to restart the GPFS daemon on all the quorum nodes:
      # mmstartup -a
      Tue Mar 19 22:31:35 EDT 2019: mmstartup: Starting GPFS ...

See also

Location

/usr/lpp/mmfs/bin