Disk Replacement
This topic describes how to replace a disk.
mmlsdisk fs-name -L
command output. Also, the to-be-replaced disk must be up for
availability).In a production cluster, you can replace physically broken disks with new disks or replace the failed disks with new disks.
- If you have non functional disks from two failure groups for replica 3, restripe the file system to protect the data to avoid data loss from a third non functional disk from the third failure group.
- Replacing the disks is time-consuming because the whole inode space must be scanned and the IO traffic in the cluster is triggered. Therefore, schedule the disk replacement when the cluster is not busy.
The mmrpldisk command can be used to replace one disk in file system with a new disk and it can handle one disk in one invocation. If you want to replace only one disk, see mmrpldisk command.
[root@c8f2n03 ~]# mmlsdisk sncfs –L
disk driver sector failure holds holds avail- storage
name type size group metadata data status ability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ------------- ------- ------- --------- --------
n03_0 nsd 512 1 Yes Yes ready up 1 system
n03_1 nsd 512 1 Yes Yes ready up 2 system desc
n04_0 nsd 512 2,0,0 Yes Yes ready up 3 system desc
n04_1 nsd 512 2,0,0 Yes Yes ready up 4 system
n05_1 nsd 512 4,0,0 No Yes ready up 5 system desc
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
[root@c8f2n03 ~]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/log -s 0
File length: 1073741824, Block Size: 1048576
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 0
numReplicasReturned: 2, numBlksReturned: 8, META_BLOCK size: 134217728
Block 0 (offset 0) is located at disks: 2 5
Block 1 (offset 134217728) is located at disks: 2 3
Block 2 (offset 268435456) is located at disks: 2 5
Block 3 (offset 402653184) is located at disks: 2 3
Block 4 (offset 536870912) is located at disks: 2 5
Block 5 (offset 671088640) is located at disks: 2 3
Block 6 (offset 805306368) is located at disks: 2 5
Block 7 (offset 939524096) is located at disks: 2 3
[root@c8f2n03 ~]# mmrpldisk sncfs n03_1 n03_4
[root@c8f2n03 ~]# mmlsdisk sncfs –L
disk driver sector failure holds holds avail- storage
name type size group metadata data status ability disk id pool remarks
------------ -------- ------ ----------- -------- ----- ----------- ------- ------- -------- --------
n03_0 nsd 512 1 Yes Yes ready up 1 system desc
n04_0 nsd 512 2,0,0 Yes Yes ready up 3 system desc
n04_1 nsd 512 2,0,0 Yes Yes ready up 4 system
n05_1 nsd 512 4,0,0 No Yes ready up 5 system desc
n03_4 nsd 512 1 Yes Yes ready up 6 system
Number of quorum disks: 3
Read quorum value: 2
Write quorum value: 2
[root@c8f2n03 ~]# /usr/lpp/mmfs/samples/fpo/tsGetDataBlk /sncfs/log -s 0
File length: 1073741824, Block Size: 1048576
Parameters: startoffset:0, skipfactor: META_BLOCK, length: 1073741824, replicas 0
numReplicasReturned: 2, numBlksReturned: 8, META_BLOCK size: 134217728
Block 0 (offset 0) is located at disks: 6 5
Block 1 (offset 134217728) is located at disks: 1 3
Block 2 (offset 268435456) is located at disks: 1 5
Block 3 (offset 402653184) is located at disks: 1 3
Block 4 (offset 536870912) is located at disks: 6 5
Block 5 (offset 671088640) is located at disks: 6 3
Block 6 (offset 805306368) is located at disks: 1 5
Block 7 (offset 939524096) is located at disks: 6 3
After replacing n03_1 with n03_4, part of data located in n03_1 are migrated into n03_4
and others are migrated into n03_0. Therefore, mmrpldisk doesn’t mean copy data from the
to-be-replaced disks into new added disks. mmrpldisk might break the data locality and
you need to see the Section 9 to restore data locality if needed.
If you want to replace more than one disk, run the mmrpldisk command multiple times. The PIT job is triggered to scan the whole inode space to migrate the data to disks that are going to be replaced. The IO traffic is triggered and is time-consuming if you have to run the mmrpldisk command multiple times. To speed the replacement process, see the following sub sections to replace more than one disk in the file system.