Physical disk procedures

This topic describes the various procedures that you can perform for the maintenance of disks.

  1. Identify the problem disks. Use the following command to check the current disks that have a problem:
    # mmvdisk pdisk list --rg all --not-ok
                                  declustered
    recovery group  pdisk            array     paths  capacity  free space  FRU (type)       state
    --------------  ------------  -----------  -----  --------  ----------  ---------------  -----
    rg_1            n002p001      DA1              0   894 GiB     890 GiB  PX04PMB096       missing/drained
    rg_1            n005p002      DA1              0   894 GiB     890 GiB  PX04PMB096       failing/replace
    
    Note: If you find the state of a disk as "missing", it usually does not mean that there is a problem with the disk drive. Therefore, the "missing" state might be because of a disk connection problem or a network problem of the node, and you need to find the root cause of the problem. For example, to re-seat the drive or bring back the node. When the state of a disk is "missing", you cannot use the procedure that is described in step 2 to replace disks, and in such a situation, contact IBM® support.
  2. Perform the following steps to replace disks:
    • To identify the pdisk to be replaced within all recovery groups:
      mmvdisk pdisk list --rg all --replace
      The system displays the following output:
      
      recovery group  pdisk         priority  FRU (type)       location
      --------------  ------------  --------  ---------------  --------
      rg_1            n005p003         12.95  00YK014          Enclosure J1005744 Drive 6  
      rg_1            n005p004         12.95  00YK014          Enclosure J1005744 Drive 7  
      
      mmvdisk: A lower priority value means a higher need for replacement.
      Note:
      • If you replace a pdisk not on this list, you risk data loss.
      • If the number of disks need replacement is below the replacement threshold for its member declustered array, then those disks will not generate call home behavior.
      • It is recommended to set your replacement threshold to 1 if you want call home happening as earlier as possible when you have only one disk failing.
    • To set your replacement threshold to 1:
      mmvdisk rg change --rg RgName --da DaName --replace-threshold 1
    • To replace hot swappable disk devices on x86_64 CPU based systems:
      1. Issue the following command:
        mmvdisk pdisk replace --prepare --recovery-group RgName --pdisk PdiskName
        The system displays an output as follows:
        mmvdisk: Suspending pdisk n005p003 of RG rg_1 in location J1005744-6.
        mmvdisk: Location J1005744-6 is Enclosure J1005744 Drive 6.
        mmvdisk: Carrier released.
        mmvdisk: 
        mmvdisk:   - Remove carrier.
        mmvdisk:   - Replace disk in location J1005744-6 with type '00YK014'.
        mmvdisk:   - Reinsert carrier.
        mmvdisk:   - Issue the following command:
        mmvdisk: 
        mmvdisk:   mmvdisk pdisk replace --recovery-group rg_1 --pdisk 'n005p003'
      2. Go to the node to replace a new disk for the pdisk according to the slot location.
      3. Issue the following command:
        mmvdisk pdisk replace --recovery-group RgName --pdisk PdiskName
        The system displays an output as follows:
        
        mmvdisk:
        mmvdisk: mmchcarrier : [I] Preparing a new pdisk for use may take many minutes.
        mmvdisk:
        mmvdisk: The following pdisks will be formatted on node HostName:
        mmvdisk:     // HostName /dev/DevName
        mmvdisk: Pdisk PdiskName of RG RgName successfully replaced.
        mmvdisk: Resuming pdisk PdiskName#nnn of RG RgName.
        mmvdisk: Carrier resumed.
        
      Note: After you replace a new pdisk in the slot, ensure to check and disable the volatile write cache on the new pdisk. For more information, see Volatile write cache detection.
    Start of change
    • To replace hot swappable disk devices on IBM z
      1. Issue the following command:
        mmvdisk pdisk replace --prepare --recovery-group RgName --pdisk PdiskName
        The system displays an output as follows:
        mmvdisk pd replace --prepare--rg rg_1 --pd n002p001
        mmvdisk: Pdisk n002p001 of RG rg_1 in location 601924ff610426c8-4-8 already suspended.
        mmvdisk: Location 601924ff610426c8-4-8 is Enclosure 601924ff610426c8 Drawer 4 Slot 8.
        mmvdisk: Carrier released.
        mmvdisk:
        mmvdisk: - A message is sent to the Support Element. Contact your HW administrator to
        plan the device replacement
        mmvdisk: - After the device has been replaced by the IBM Support, issue the following
        command:
        mmvdisk:
        mmvdisk: mmvdisk pdisk replace --recovery-group rg_1 --pdisk 'n002p001'
      2. Contact your HW administrator to plan the device replacement. Wait for the IBM Support to replace the device.
      3. Issue the following command:
        mmvdisk pdisk replace --recovery-group RgName --pdisk PdiskName
        The system displays an output as follows:
        
        mmvdisk:
        mmvdisk: mmchcarrier : [I] Preparing a new pdisk for use may take many minutes.
        mmvdisk:
        mmvdisk: The following pdisks will be formatted on node HostName:
        mmvdisk:     // HostName /dev/DevName
        mmvdisk: Pdisk PdiskName of RG RgName successfully replaced.
        mmvdisk: Resuming pdisk PdiskName#nnn of RG RgName.
        mmvdisk: Carrier resumed.
        
    End of change