mmrpldisk command

Replaces the specified disk.

Synopsis

mmrpldisk Device DiskName {DiskDesc | -F StanzaFile} [-v {yes | no}]
                 [-N {Node[,Node...] | NodeFile | NodeClass}]
                 [--inode-criteria CriteriaFile] [-o InodeResultFile]
                 [--qos QOSClass]

Availability

Available on all IBM Spectrum Scale editions.

Description

Use the mmrpldisk command to replace an existing disk in the GPFS file system with a new one. All data on the old disk is migrated to the new disk.

To replace a disk in a GPFS file system, you must first decide if you will:

Create a new disk using the mmcrnsd command.
In this case, use the rewritten disk stanza file produced by the mmcrnsd command or create a new disk stanza. When using the rewritten file, the disk usage and failure group specifications remain the same as specified on the mmcrnsd command.
Select a disk no longer in any file system. Issue the mmlsnsd -F command to display the available disks.

The disk may then be used to replace a disk in the file system using the mmrpldisk command.

Notes:

You cannot replace a disk when it is the only remaining disk in the file system.
Under no circumstances should you replace a stopped disk. You need to start a stopped disk before replacing it. If a disk cannot be started, delete it using the mmdeldisk command. See the IBM Spectrum Scale: Problem Determination Guide and search for Disk media failure.
The file system may be mounted when running the mmrpldisk command.

Results

Upon successful completion of the mmrpldisk command, the disk is replaced in the file system and data is copied to the new disk without restriping.

Parameters

Device

The device name of the file system where the disk is to be replaced. File system names need not be fully-qualified. fs0 is as acceptable as /dev/fs0.

This must be the first parameter.

DiskName

The name of the disk to be replaced. To display the names of disks that belong to the file system, issue the mmlsnsd -f, mmlsfs -d, or mmlsdisk command. The mmlsdisk command will also show the current disk usage and failure group values for each of the disks.

DiskDesc

A descriptor for the replacement disk.

Prior to GPFS 3.5, the disk information for the mmrpldisk command was specified in the form of a disk descriptor defined as follows (with the second, third, sixth, and seventh fields reserved):

DiskName:::DiskUsage:FailureGroup:::

For backward compatibility, the mmrpldisk command will still accept a traditional disk descriptor as input, but this use is discouraged.

-F StanzaFile

Specifies a file containing the NSD stanzas for the replacement disk. NSD stanzas have this format:

%nsd: 
  nsd=NsdName
  usage={dataOnly | metadataOnly | dataAndMetadata | descOnly}
  failureGroup=FailureGroup
  pool=StoragePool
  servers=ServerList
  device=DiskName

where:

nsd=NsdName

The name of an NSD previously created by the mmcrnsd command. For a list of available disks, issue the mmlsnsd -F command. This clause is mandatory for the mmrpldisk command.

usage={dataOnly | metadataOnly | dataAndMetadata | descOnly}

Specifies the type of data to be stored on the disk:

dataAndMetadata: Indicates that the disk contains both data and metadata. This is the default for disks in the system pool.
dataOnly: Indicates that the disk contains data and does not contain metadata. This is the default for disks in storage pools other than the system pool.
metadataOnly: Indicates that the disk contains metadata and does not contain data.
descOnly: Indicates that the disk contains no data and no file metadata. Such a disk is used solely to keep a copy of the file system descriptor, and can be used as a third failure group in certain disaster-recovery configurations. For more information, see the IBM Spectrum Scale: Administration Guide and search for Synchronous mirroring utilizing GPFS replication

This clause is optional for the mmrpldisk command. If omitted, the new disk will inherit the usage type of the disk being replaced.

failureGroup=FailureGroup

Identifies the failure group to which the disk belongs. A failure group identifier can be a simple integer or a topology vector that consists of up to three comma-separated integers. The default is -1, which indicates that the disk has no point of failure in common with any other disk.

GPFS uses this information during data and metadata placement to ensure that no two replicas of the same block can become unavailable due to a single failure. All disks that are attached to the same NSD server or adapter must be placed in the same failure group.

If the file system is configured with data replication, all storage pools must have two failure groups to maintain proper protection of the data. Similarly, if metadata replication is in effect, the system storage pool must have two failure groups.

Disks that belong to storage pools in which write affinity is enabled can use topology vectors to identify failure domains in a shared-nothing cluster. Disks that belong to traditional storage pools must use simple integers to specify the failure group.

This clause is optional for the mmrpldisk command. If omitted, the new disk will inherit the failure group of the disk being replaced.

pool=StoragePool

Specifies the storage pool to which the disk is to be assigned. This clause is ignored by the mmrpldisk command.

servers=ServerList

A comma-separated list of NSD server nodes. This clause is ignored by the mmrpldisk command.

device=DiskName

The block device name of the underlying disk device. This clause is ignored by the mmrpldisk command.

Note: While it is not absolutely necessary to specify the same parameters for the new disk as the old disk, it is suggested that you do so. If the new disk is equivalent in size to the old disk, and if the disk usage and failure group parameters are the same, the data and metadata can be completely migrated from the old disk to the new disk. A disk replacement in this manner allows the file system to maintain its current data and metadata balance.

If the new disk has a different size, disk usage, parameter, or failure group parameter, the operation may leave the file system unbalanced and require a restripe. Additionally, a change in size or the disk usage parameter may cause the operation to fail since other disks in the file system may not have sufficient space to absorb more data or metadata. In this case, first use the mmadddisk command to add the new disk, the mmdeldisk command to delete the old disk, and finally the mmrestripefs command to rebalance the file system.

-v {yes | no}

Verify the new disk does not belong to an existing file system. The default is -v yes. Specify -v no only when you want to reuse a disk that is no longer needed for an existing file system. If the command is interrupted for any reason, use the -v no option on the next invocation of the command.

Important: Using -v no on a disk that already belongs to a file system will corrupt that file system. This will not be noticed until the next time that file system is mounted.

-N {Node[,Node...] | NodeFile | NodeClass}

Specify the nodes that participate in the migration of data from the old to the new disk. This command supports all defined node classes. The default is all or the current value of the defaultHelperNodes parameter of the mmchconfig command.

For general information on how to specify node names, see Specifying nodes as input to GPFS commands.

--inode-criteria CriteriaFile

Specifies the interesting inode criteria flag, where CriteriaFile is one of the following:

BROKEN: Indicates that a file has a data block with all of its replicas on disks that have been removed.
Note: BROKEN is always included in the list of flags even if it is not specified.
dataUpdateMiss: Indicates that at least one data block was not updated successfully on all replicas.
exposed: Indicates an inode with an exposed risk; that is, the file has data where all replicas are on suspended disks. This could cause data to be lost if the suspended disks have failed or been removed.
illCompressed: Indicates an inode in which file compression or decompression is deferred, or in which a compressed file is partly decompressed to allow the file to be written into or memory-mapped.
illPlaced: Indicates an inode with some data blocks that might be stored in an incorrect storage pool.
illReplicated: Indicates that the file has a data block that does not meet the setting for the replica.
metaUpdateMiss: Indicates that there is at least one metadata block that has not been successfully updated to all replicas.
unbalanced: Indicates that the file has a data block that is not well balanced across all the disks in all failure groups.

Note: If a file matches any of the specified interesting flags, all of its interesting flags (even those not specified) will be displayed.

-o InodeResultFile

Contains a list of the inodes that met the interesting inode flags that were specified on the --inode-criteria parameter. The output file contains the following:

INODE_NUMBER

This is the inode number.

DISKADDR

Specifies a dummy address for later tsfindinode use.

SNAPSHOT_ID

This is the snapshot ID.

ISGLOBAL_SNAPSHOT

Indicates whether or not the inode is in a global snapshot. Files in the live file system are considered to be in a global snapshot.

INDEPENDENT_FSETID

Indicates the independent fileset to which the inode belongs.

MEMO (INODE_FLAGS FILE_TYPE [ERROR])

Indicates the inode flag and file type that will be printed:

Inode flags:

BROKEN
exposed
dataUpdateMiss
illCompressed
illPlaced      
illReplicated
metaUpdateMiss  
unbalanced

File types:

BLK_DEV 
CHAR_DEV 
DIRECTORY
FIFO
LINK
LOGFILE
REGULAR_FILE
RESERVED
SOCK
*UNLINKED*
*DELETED*

Notes:

An error message will be printed in the output file if an error is encountered when repairing the inode.
DISKADDR, ISGLOBAL_SNAPSHOT, and FSET_ID work with the tsfindinode tool (/usr/lpp/mmfs/bin/tsfindinode) to find the file name for each inode. tsfindinode uses the output file to retrieve the file name for each interesting inode.

--qos QOSClass

Specifies the Quality of Service for I/O operations (QoS) class to which the instance of the command is assigned. If you do not specify this parameter, the instance of the command is assigned by default to the maintenance QoS class. This parameter has no effect unless the QoS service is enabled. For more information, see the topic mmchqos command. Specify one of the following QoS classes:

maintenance: This QoS class is typically configured to have a smaller share of file system IOPS. Use this class for I/O-intensive, potentially long-running GPFS commands, so that they contribute less to reducing overall file system performance.
other: This QoS class is typically configured to have a larger share of file system IOPS. Use this class for administration commands that are not I/O-intensive.

For more information, see the topic Setting the Quality of Service for I/O operations (QoS).

Exit status

0: Successful completion.
nonzero: A failure has occurred.

Security

You must have root authority to run the mmrpldisk command.

The node on which the command is issued must be able to execute remote shell commands on any other node in the cluster without the use of a password and without producing any extraneous messages. For more information, see Requirements for administering a GPFS file system.

Examples

To replace disk hd27n01 in fs1 with a new disk, hd16vsdn10 allowing the disk usage and failure group parameters to default to the corresponding values of hd27n01, and have only nodes c154n01, c154n02, and c154n09 participate in the migration of the data, issue this command:

mmrpldisk fs1 hd27n01 hd16vsdn10 -N c154n01,c154n02,c154n09

The system displays information similar to:

Replacing hd27n01 ...

The following disks of fs1 will be formatted on node c155n01.ppd.pok.ibm.com:
    hd16vsdn10: size 17793024 KB
Extending Allocation Map
Checking Allocation Map for storage pool 'system'
   7 % complete on Wed May 16 16:36:30 2007
  18 % complete on Wed May 16 16:36:35 2007
  34 % complete on Wed May 16 16:36:40 2007
  49 % complete on Wed May 16 16:36:45 2007
  65 % complete on Wed May 16 16:36:50 2007
  82 % complete on Wed May 16 16:36:55 2007
  98 % complete on Wed May 16 16:37:00 2007
 100 % complete on Wed May 16 16:37:01 2007
Completed adding disks to file system fs1.
Scanning system storage pool
Scanning file system metadata, phase 1 ...
   2 % complete on Wed May 16 16:37:04 2007
   7 % complete on Wed May 16 16:37:11 2007
  14 % complete on Wed May 16 16:37:18 2007
  20 % complete on Wed May 16 16:37:24 2007
  27 % complete on Wed May 16 16:37:31 2007
  34 % complete on Wed May 16 16:37:37 2007
  50 % complete on Wed May 16 16:37:50 2007
  61 % complete on Wed May 16 16:38:00 2007
  68 % complete on Wed May 16 16:38:06 2007
  79 % complete on Wed May 16 16:38:16 2007
  90 % complete on Wed May 16 16:38:26 2007
 100 % complete on Wed May 16 16:38:32 2007
Scan completed successfully.
Scanning file system metadata, phase 2 ...
Scanning file system metadata for fs1sp1 storage pool
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
Scan completed successfully.
Scanning user file metadata ...
   3 %  complete on Wed May 16 16:38:38 2007
  25 %  complete on Wed May 16 16:38:47 2007
  53 %  complete on Wed May 16 16:38:53 2007
  87 %  complete on Wed May 16 16:38:59 2007
  97 %  complete on Wed May 16 16:39:06 2007
 100 %  complete on Wed May 16 16:39:07 2007
Scan completed successfully.
Done
mmrpldisk: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

To replace disk vmip3_nsd1 from storage pool GOLD on file system fs2 and to search for any interesting files handled during the mmrpldisk at the same time, issue this command:

mmrpldisk fs2 vmip3_nsd1 -F f /tmp/crit --inode-criteria