IBM Support

Replacing a SAS RAID (0) Array drive that is part of rootvg in VIOS environment

How To


Summary

This technote discusses the steps to be considered from VIOS prior to replacing a failing pdisk in a SAS RAID (0) Array.

Objective

To avoid undesired complications in the RAID Array definitions due to subtle steps required for replacement of the SAS drives.

Environment

VIO servers

Steps

When a part of the RAID system fails, different RAID levels help to recover data in different ways.

If a single drive fails within an array,  the array controller can reconstruct the data for the failed disk by using the data stored on other hard drives within the same array.  However, the exception will be with the RAID level (0) “non-redundant Array” as there is no data redundancy available for the purpose of reconstructing data in the event of a disk failure.  In that case, the Array must be deleted first before replacing it's failed pdisk.

In case of a disk failure:

1. First identify if the failing disk is a member of a volume group.

  • In our example here we have 2 mirrored rootvg hdisks and let us say that hdisk3 is the one that needs replacement:
    $ lspv
    NAME             PVID                                                VG               STATUS
    hdisk0           00f67120dc640b9c                    None
    hdisk1           00c3cb473454c6e5                    None
    hdisk2           00f67120e8aef1ae                     rootvg           active
    hdisk3           00f67120cbaacc90                     rootvg           active
  •  List the configured hdisks to identify their type and status:
    $ lsdev -type disk
    name             status      description
    hdisk0           Available   SAS Disk Drive
    hdisk1           Available   SAS Disk Drive
    hdisk2           Available   SAS RAID 5 Disk Array
    hdisk3           Available   SAS RAID 0 Disk Array

    >In here we can see hdisk0 and hdisk1 are JBOD "SAS disk drives", unlike hdisk2 and the failing hdisk3 which are "SAS RAID Disk Arrays".  In this lab example hdisk2 and hdisk3 are on different RAID levels.  In a typical VIOS both will be RAID 0.
  • Let’s identify the parent device for the failing RAID Array disk so we can take a closer look into how it is configured:
    $ lsdev -dev hdisk3 -parent
    parent
    sas0
    >So the drive here is connected on sas0.  To check the controller’s adapter run:
    $ lsdev -dev sas0 -parent
    parent
    sissas0
  • Now check the RAID Array configuration on sissas0 and identify the pdisk of hdisk3.  As padmin, switch to the root shell and launch "SAS Disk Array Manager" smitty short menus to manage the RAID Arrays:
    $ oem_setup_env
    # smitty sasdam

    Select "List SAS Disk Array Configuration"
    Choose the controller, in our case here "sissas0" and hit Enter.
    ------------------------------------------------------------------------
    Name      Resource  State       Description              Size
    ------------------------------------------------------------------------
    sissas0   FFFFFFFF  Available   PCI-X266 Planar 3Gb SAS Adapter
    hdisk2    00FF0000  Optimal    RAID 5 Array           279.2GB
       pdisk0   00000C00  Active      Array Member           139.6GB
       pdisk1   00004900  Active      Array Member           139.6GB
       pdisk2   00000E00  Active      Array Member           139.6GB

    hdisk3    00FF00100  Optimal     RAID 0 Array           283.8GB
       pdisk3   00040000  Active      Array Member           283.8GB  <--------

    pdisk4   00050000  Active      Array Candidate        283.8GB
    pdisk5    00030000  Active      Hot Spare              283.8GB

    >Now that we know we have a RAID(0) configuration and we are able to identify the failing pdisk, we can start with unmirror/reduce the hdisk from the volume group "without running the rmdev command at any step", then delete the RAID Array using "SAS Disk Array Manager".  Now you can safely replace the failing pdisk and after that recreate the RAID 0 Array.

-----------------------------------------
*Before proceeding, it is strongly advised to check the “good” mirror disk using a surface scan to make sure it has no hidden defects:
# dd if=/dev/rhdisk2 of=/dev/null bs=4m
When it completes check that no errors got logged against hdisk2 in errpt.  If there are errors, do not proceed. Call IBM software support.

-----------------------------------------

2. Unmirror/reduce hdisk from rootvg:

  • Unmirror:
    $ unmirrorios hdisk3             
  •  Verify failing disk is empty:
    $ lspv -map hdisk3
    Make sure no data is returned. If so, use the migratepv command to move data to hdisk2 (good disk) 
  • Reduce hdisk:
    $ reducevg rootvg hdisk3                                               

-------------------------------------------------------------------------
*Don't rmdev the hdisk or the associated pdisk*
-------------------------------------------------------------------------

3. Delete the RAID Array using "SAS Disk Array Manager"

  • as root:
    # smitty sasdam
  • Select "Delete a SAS Disk Array"
  • Select the controller from the list in our case it's sissas0
  • Select the SAS Disk Array to delete
  • Follow the prompts and hit Enter
  • Once the operation completes successfully, hit F3 to get back to "SAS Disk Array Manager" main menu

------------------------------

4. Replace the failing pdisk:

  • This time Select "Diagnostics and Recovery Options"
  • Select  "SCSI and SCSI RAID Hot Plug Manager"
  • Select option "Identify a Device Attached to an SCSI Hot Swap Enclosure Device"
  • Choose the slot corresponding to the failing pdisk, The visual indicator on the device will flash at the Identify rate
  • Select the option "Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure Device"
    >The visual indicator on the device flashes at the Remove rate to assist you or the IBM service representative to identify the appropriate drive. Identify the disk with LED flashing at Remove Rate and physically replace the pdisk
  • Exit "SAS Disk Array Manager" using F10

------------------------------

5. Prepare the new drive:

  • After the pdisk is physically replaced, as padmin run cfgdev to configure the new drive
  • Enter "SAS Disk Array Manager" smitty menu, select "List SAS Disk Array Configuration" and choose the sissas controller
  • It should now list the new available drive as either a pdisk "Array candidate" or hdisk "SAS disk drive":
    >If shown as pdisk "Array candidate" skip next step and continue to Step 6 to create the RAID 0 Array
    >If shown as hdisk "SAS disk drive" then you need to format the disk to 528 bytes RAID block size before it can be used in RAID configuration as follow:
  • From the "SAS Disk Array Manager" main menu, select "Create an Array Candidate pdisk and Format to RAID block size"
  • Select the controller from the list
  • Select the new drive to be formatted as an Array Candidate
    >the status bar will show the formatting progress

-----------------------------

6. Create the RAID 0 Array with the newly formatted array candidate pdisk:

  • From the "SAS Disk Array Manager" main menu, select "Create a SAS Disk Array"
  • Select the sissas controller
  • Select the RAID level as (0)
  • Select the Stripe Size as recommended on the menu
  • Select the pdisk to be used to create the RAID 0 Array
    >Once the Array gets created it will create a new hdisk "SAS RAID 0 Disk Array" that can be used normally in a volume group

----------------------------

7. Finally, add/mirror the new hdisk to rootvg:

  • -list free physical volumes:
    $ lspv -free
    should be able to see the new hdisk
  • add the new hdisk to rootvg:
    $ extendvg rootvg <newhdisk>
  • mirror rootvg:
    $ mirrorios <newhdisk>
  • confirm mirroring:
    $ lsvg -lv rootvg
    Note the # of PPs is double the # of LPs
  • Confirm that bootlist includes the new mirrored disk:
    $ bootlist -mode normal -ls
  • If needed, modify the bootlist with the mirrored disks:
    $ bootlist -mode normal oldhdisk newhdisk

----------------------------

End of procedure...


Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"Component":"VIOS;AIX;SAS RAID","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

Document Information

Modified date:
05 March 2021

UID

ibm16078766