Disk Replacement

This topic describes how to replace a disk.

Note: At any time, you cannot run the mmrpldisk command to replace one stopped disk used by the file system (Check the disk availability from the mmlsdisk fs-name -L command output. Also, the to-be-replaced disk must be up for availability).

In a production cluster, you can replace physically broken disks with new disks or replace the failed disks with new disks.

If you have non functional disks from two failure groups for replica 3, restripe the file system to protect the data to avoid data loss from a third non functional disk from the third failure group.
Replacing the disks is time-consuming because the whole inode space must be scanned and the IO traffic in the cluster is triggered. Therefore, schedule the disk replacement when the cluster is not busy.

The mmrpldisk command can be used to replace one disk in file system with a new disk and it can handle one disk in one invocation. If you want to replace only one disk, see mmrpldisk command.

Note: In FPO, sometimes mmrpldisk command does not migrate all data from the to-be-replaced disk to the newly added disk. This bug impacts IBM Storage Scale Release 3.5 and later. See the following example:

[root@c8f2n03 ~]# mmlsdisk sncfs –L
disk         driver   sector     failure holds    holds               avail-           storage
name         type       size       group metadata data  status        ability disk id  pool      remarks   
------------ -------- ------ ----------- -------- ----- ------------- ------- -------  --------- --------
n03_0        nsd         512           1 Yes      Yes   ready         up            1   system        
n03_1        nsd         512           1 Yes      Yes   ready         up            2   system    desc
n04_0        nsd         512       2,0,0 Yes      Yes   ready         up            3   system    desc
n04_1        nsd         512       2,0,0 Yes      Yes   ready         up            4   system        
n05_1        nsd         512       4,0,0 No       Yes   ready         up            5   system    desc
Number of quorum disks: 3 
Read quorum value:      2
Write quorum value:     2

[root@c8f2n03 ~]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/log -s 0
File length: 1073741824, Block Size: 1048576
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 0
numReplicasReturned: 2, numBlksReturned: 8, META_BLOCK size: 134217728
Block 0 (offset 0) is located at disks:  2   5 
Block 1 (offset 134217728) is located at disks:  2   3 
Block 2 (offset 268435456) is located at disks:  2   5 
Block 3 (offset 402653184) is located at disks:  2   3 
Block 4 (offset 536870912) is located at disks:  2   5 
Block 5 (offset 671088640) is located at disks:  2   3 
Block 6 (offset 805306368) is located at disks:  2   5 
Block 7 (offset 939524096) is located at disks:  2   3

[root@c8f2n03 ~]# mmrpldisk sncfs n03_1 n03_4
[root@c8f2n03 ~]# mmlsdisk sncfs –L
disk         driver   sector     failure holds    holds             avail-           storage
name         type       size       group metadata data  status      ability disk id  pool      remarks   
------------ -------- ------ ----------- -------- ----- ----------- ------- -------  -------- --------
n03_0        nsd         512           1 Yes      Yes   ready       up            1    system     desc
n04_0        nsd         512       2,0,0 Yes      Yes   ready       up            3    system     desc
n04_1        nsd         512       2,0,0 Yes      Yes   ready       up            4    system        
n05_1        nsd         512       4,0,0 No       Yes   ready       up            5    system     desc
n03_4        nsd         512           1 Yes      Yes   ready       up            6    system        
Number of quorum disks: 3 
Read quorum value:      2
Write quorum value:     2
[root@c8f2n03 ~]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/log -s 0
File length: 1073741824, Block Size: 1048576
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 0
numReplicasReturned: 2, numBlksReturned: 8, META_BLOCK size: 134217728
Block 0 (offset 0) is located at disks:  6   5 
Block 1 (offset 134217728) is located at disks:  1   3 
Block 2 (offset 268435456) is located at disks:  1   5 
Block 3 (offset 402653184) is located at disks:  1   3 
Block 4 (offset 536870912) is located at disks:  6   5 
Block 5 (offset 671088640) is located at disks:  6   3 
Block 6 (offset 805306368) is located at disks:  1   5 
Block 7 (offset 939524096) is located at disks:  6   3
After replacing n03_1 with n03_4, part of data located in n03_1 are migrated into n03_4
and others are migrated into n03_0. Therefore, mmrpldisk doesn’t mean copy data from the
to-be-replaced disks into new added disks. mmrpldisk might break the data locality and
you need to see the Section 9 to restore data locality if needed.

If you want to replace more than one disk, run the mmrpldisk command multiple times. The PIT job is triggered to scan the whole inode space to migrate the data to disks that are going to be replaced. The IO traffic is triggered and is time-consuming if you have to run the mmrpldisk command multiple times. To speed the replacement process, see the following sub sections to replace more than one disk in the file system.