Replacing failed disks in a Power 775 Disk Enclosure recovery group: a sample scenario
The scenario presented here shows how to detect and replace failed disks in a recovery group built on a Power® 775 Disk Enclosure.
Detecting failed disks in your enclosure
- 000DE37TOP containing the disks in the top set of carriers
- 000DE37BOT containing the disks in the bottom set of carriers
- one log declustered array (LOG)
- four data declustered arrays (DA1, DA2, DA3, DA4)
- 47 pdisks per data declustered array
- each member pdisk from the same carrier slot
- default disk replacement threshold value set to 2
The replacement threshold of 2 means that GNR will only require disk replacement when two or more disks have failed in the declustered array; otherwise, rebuilding onto spare space or reconstruction from redundancy will be used to supply affected data.
# mmlsrecoverygroup 000DE37TOP -L
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task progress priority
----------- ------- ------ ------ ------ --------- ---------- -------- -------------------------
DA1 no 2 47 2 2 3072 MiB 14 days scrub 63% low
DA2 no 2 47 2 2 3072 MiB 14 days scrub 19% low
DA3 yes 2 47 2 2 0 B 14 days rebuild-2r 48% low
DA4 no 2 47 2 2 3072 MiB 14 days scrub 33% low
LOG no 1 4 1 1 546 GiB 14 days scrub 87% low
declustered
vdisk RAID code array vdisk size remarks
------------------ ------------------ ----------- ---------- -------
000DE37TOPLOG 3WayReplication LOG 4144 MiB log
000DE37TOPDA1META 4WayReplication DA1 250 GiB
000DE37TOPDA1DATA 8+3p DA1 17 TiB
000DE37TOPDA2META 4WayReplication DA2 250 GiB
000DE37TOPDA2DATA 8+3p DA2 17 TiB
000DE37TOPDA3META 4WayReplication DA3 250 GiB
000DE37TOPDA3DATA 8+3p DA3 17 TiB
000DE37TOPDA4META 4WayReplication DA4 250 GiB
000DE37TOPDA4DATA 8+3p DA4 17 TiB
active recovery group server servers
----------------------------------------------- -------
server1 server1,server2
The indication that disk replacement is called for in this recovery group is the value of yes in the needs service column for declustered array DA3.
The fact that DA3 (the declustered array on the disks in carrier slot 3) is undergoing rebuild of its RAID tracks that can tolerate two strip failures is by itself not an indication that disk replacement is required; it merely indicates that data from a failed disk is being rebuilt onto spare space. Only if the replacement threshold has been met will disks be marked for replacement and the declustered array marked as needing service.
- entries in the AIX® error report or the Linux® syslog
- the pdReplacePdisk callback, which can be configured to run an administrator-supplied script at the moment a pdisk is marked for replacement
- the POWER7 cluster event notification TEAL agent, which can be configured to send disk replacement notices when they occur to the POWER7 cluster EMS
- the output from the following commands, which may be performed
from the command line on any GPFS cluster
node (see the examples that follow):
- mmlsrecoverygroup with the -L flag shows yes in the needs service column
- mmlsrecoverygroup with the -L and --pdisk flags; this shows the states of all pdisks, which may be examined for the replace pdisk state
- mmlspdisk with the --replace flag, which lists only those pdisks that are marked for replacement
# mmlsrecoverygroup 000DE37TOP -L --pdisk
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task progress priority
----------- ------- ------ ------ ------ --------- ---------- -------- -------------------------
DA1 no 2 47 2 2 3072 MiB 14 days scrub 63% low
DA2 no 2 47 2 2 3072 MiB 14 days scrub 19% low
DA3 yes 2 47 2 2 0 B 14 days rebuild-2r 68% low
DA4 no 2 47 2 2 3072 MiB 14 days scrub 34% low
LOG no 1 4 1 1 546 GiB 14 days scrub 87% low
n. active, declustered user state,
pdisk total paths array free space condition remarks
----------------- ----------- ----------- ---------- ----------- -------
[...]
c014d1 2, 4 DA1 62 GiB normal ok
c014d2 2, 4 DA2 279 GiB normal ok
c014d3 0, 0 DA3 279 GiB replaceable dead/systemDrain/noRGD/noVCD/replace
c014d4 2, 4 DA4 12 GiB normal ok
[...]
c018d1 2, 4 DA1 24 GiB normal ok
c018d2 2, 4 DA2 24 GiB normal ok
c018d3 2, 4 DA3 558 GiB replaceable dead/systemDrain/noRGD/noVCD/noData/replace
c018d4 2, 4 DA4 12 GiB normal ok
[...]
- c014d3 in DA3
- c018d3 in DA3
The naming convention used during recovery group creation indicates that these are the disks in slot 3 of carriers 14 and 18. To confirm the physical locations of the failed disks, use the mmlspdisk command to list information about those pdisks in declustered array DA3 of recovery group 000DE37TOP that are marked for replacement:
# mmlspdisk 000DE37TOP --declustered-array DA3 --replace
pdisk:
replacementPriority = 1.00
name = "c014d3"
device = "/dev/rhdisk158,/dev/rhdisk62"
recoveryGroup = "000DE37TOP"
declusteredArray = "DA3"
state = "dead/systemDrain/noRGD/noVCD/replace"
.
.
.
pdisk:
replacementPriority = 1.00
name = "c018d3"
device = "/dev/rhdisk630,/dev/rhdisk726"
recoveryGroup = "000DE37TOP"
declusteredArray = "DA3"
state = "dead/systemDrain/noRGD/noVCD/noData/replace"
.
.
.
The preceding location code attributes confirm the pdisk naming convention:
Disk | Location code | Interpretation |
---|---|---|
pdisk c014d3 | 78AD.001.000DE37-C14-D3 | Disk 3 in carrier 14 in the disk enclosure identified by enclosure type 78AD.001 and serial number 000DE37 |
pdisk c018d3 | 78AD.001.000DE37-C18-D3 | Disk 3 in carrier 18 in the disk enclosure identified by enclosure type 78AD.001 and serial number 000DE37 |
Replacing the failed disks in a Power 775 Disk Enclosure recovery group
- Using the mmchcarrier command with the --release flag to suspend use of the other disks in the carrier and to release the carrier.
- Removing the carrier and replacing the failed disk within with a new one.
- Using the mmchcarrier command with the --replace flag to resume use of the suspended disks and to begin use of the new disk.
- To release carrier 14 in disk enclosure 000DE37:
# mmchcarrier 000DE37TOP --release --pdisk c014d3 [I] Suspending pdisk c014d1 of RG 000DE37TOP in location 78AD.001.000DE37-C14-D1. [I] Suspending pdisk c014d2 of RG 000DE37TOP in location 78AD.001.000DE37-C14-D2. [I] Suspending pdisk c014d3 of RG 000DE37TOP in location 78AD.001.000DE37-C14-D3. [I] Suspending pdisk c014d4 of RG 000DE37TOP in location 78AD.001.000DE37-C14-D4. [I] Carrier released. - Remove carrier. - Replace disk in location 78AD.001.000DE37-C14-D3 with FRU 74Y4936. - Reinsert carrier. - Issue the following command: mmchcarrier 000DE37TOP --replace --pdisk 'c014d3' Repair timer is running. Perform the above within 5 minutes to avoid pdisks being reported as missing.
GNR issues instructions as to the physical actions that must be taken. Note that disks may be suspended only so long before they are declared missing; therefore the mechanical process of physically performing disk replacement must be accomplished promptly.
Use of the other three disks in carrier 14 has been suspended, and carrier 14 is unlocked. The identify lights for carrier 14 and for disk 3 are on.
- Carrier 14 should be unlatched and removed. The failed disk 3, as indicated by the internal identify light, should be removed, and the new disk with FRU 74Y4936 should be inserted in its place. Carrier 14 should then be reinserted and the latch closed.
- To finish the replacement of pdisk c014d3:
# mmchcarrier 000DE37TOP --replace --pdisk c014d3 [I] The following pdisks will be formatted on node server1: /dev/rhdisk354 [I] Pdisk c014d3 of RG 000DE37TOP successfully replaced. [I] Resuming pdisk c014d1 of RG 000DE37TOP. [I] Resuming pdisk c014d2 of RG 000DE37TOP. [I] Resuming pdisk c014d3#162 of RG 000DE37TOP. [I] Resuming pdisk c014d4 of RG 000DE37TOP. [I] Carrier resumed.
When the mmchcarrier --replace command returns successfully, GNR has resumed use of the other 3 disks. The failed pdisk may remain in a temporary form (indicated here by the name c014d3#162) until all data from it has been rebuilt, at which point it is finally deleted. The new replacement disk, which has assumed the name c014d3, will have RAID tracks rebuilt and rebalanced onto it. Notice that only one block device name is mentioned as being formatted as a pdisk; the second path will be discovered in the background.
This can be confirmed with mmlsrecoverygroup -L --pdisk:
# mmlsrecoverygroup 000DE37TOP -L --pdisk
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 193
declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task progress priority
----------- ------- ------ ------ ------ --------- ---------- -------- -------------------------
DA1 no 2 47 2 2 3072 MiB 14 days scrub 63% low
DA2 no 2 47 2 2 3072 MiB 14 days scrub 19% low
DA3 yes 2 48 2 2 0 B 14 days rebuild-2r 89% low
DA4 no 2 47 2 2 3072 MiB 14 days scrub 34% low
LOG no 1 4 1 1 546 GiB 14 days scrub 87% low
n. active, declustered user state,
pdisk total paths array free space condition remarks
----------------- ----------- ----------- ---------- ----------- -------
[...]
c014d1 2, 4 DA1 23 GiB normal ok
c014d2 2, 4 DA2 23 GiB normal ok
c014d3 2, 4 DA3 550 GiB normal ok
c014d3#162 0, 0 DA3 543 GiB replaceable dead/adminDrain/noRGD/noVCD/noPath
c014d4 2, 4 DA4 23 GiB normal ok
[...]
c018d1 2, 4 DA1 24 GiB normal ok
c018d2 2, 4 DA2 24 GiB normal ok
c018d3 0, 0 DA3 558 GiB replaceable dead/systemDrain/noRGD/noVCD/noData/replace
c018d4 2, 4 DA4 23 GiB normal ok
[...]
Notice that the temporary pdisk c014d3#162 is counted in the total number of pdisks in declustered array DA3 and in the recovery group, until it is finally drained and deleted.
Notice also that pdisk c018d3 is still marked for replacement, and that DA3 still needs service. This is because GNR replacement policy expects all failed disks in the declustered array to be replaced once the replacement threshold is reached. The replace state on a pdisk is not removed when the total number of failed disks goes under the threshold.
- Release carrier 18 in disk enclosure 000DE37:
# mmchcarrier 000DE37TOP --release --pdisk c018d3 [I] Suspending pdisk c018d1 of RG 000DE37TOP in location 78AD.001.000DE37-C18-D1. [I] Suspending pdisk c018d2 of RG 000DE37TOP in location 78AD.001.000DE37-C18-D2. [I] Suspending pdisk c018d3 of RG 000DE37TOP in location 78AD.001.000DE37-C18-D3. [I] Suspending pdisk c018d4 of RG 000DE37TOP in location 78AD.001.000DE37-C18-D4. [I] Carrier released. - Remove carrier. - Replace disk in location 78AD.001.000DE37-C18-D3 with FRU 74Y4936. - Reinsert carrier. - Issue the following command: mmchcarrier 000DE37TOP --replace --pdisk 'c018d3' Repair timer is running. Perform the above within 5 minutes to avoid pdisks being reported as missing.
- Unlatch and remove carrier 18, remove and replace failed disk 3, reinsert carrier 18, and close the latch.
- To finish the replacement of pdisk c018d3:
# mmchcarrier 000DE37TOP --replace --pdisk c018d3 [I] The following pdisks will be formatted on node server1: /dev/rhdisk674 [I] Pdisk c018d3 of RG 000DE37TOP successfully replaced. [I] Resuming pdisk c018d1 of RG 000DE37TOP. [I] Resuming pdisk c018d2 of RG 000DE37TOP. [I] Resuming pdisk c018d3#166 of RG 000DE37TOP. [I] Resuming pdisk c018d4 of RG 000DE37TOP. [I] Carrier resumed.
# mmlsrecoverygroup 000DE37TOP -L --pdisk
declustered
recovery group arrays vdisks pdisks
----------------- ----------- ------ ------
000DE37TOP 5 9 192
declustered needs replace scrub background activity
array service vdisks pdisks spares threshold free space duration task progress priority
----------- ------- ------ ------ ------ --------- ---------- -------- -------------------------
DA1 no 2 47 2 2 3072 MiB 14 days scrub 64% low
DA2 no 2 47 2 2 3072 MiB 14 days scrub 22% low
DA3 no 2 47 2 2 2048 MiB 14 days rebalance 12% low
DA4 no 2 47 2 2 3072 MiB 14 days scrub 36% low
LOG no 1 4 1 1 546 GiB 14 days scrub 89% low
n. active, declustered user state,
pdisk total paths array free space condition remarks
----------------- ----------- ----------- ---------- ----------- -------
[...]
c014d1 2, 4 DA1 23 GiB normal ok
c014d2 2, 4 DA2 23 GiB normal ok
c014d3 2, 4 DA3 271 GiB normal ok
c014d4 2, 4 DA4 23 GiB normal ok
[...]
c018d1 2, 4 DA1 24 GiB normal ok
c018d2 2, 4 DA2 24 GiB normal ok
c018d3 2, 4 DA3 542 GiB normal ok
c018d4 2, 4 DA4 23 GiB normal ok
[...]
Notice that both temporary pdisks have been deleted. This is because c014d3#162 has finished draining, and because pdisk c018d3#166 had, before it was replaced, already been completely drained (as evidenced by the noData flag). Declustered array DA3 no longer needs service and once again contains 47 pdisks, and the recovery group once again contains 192 pdisks.