Recovering by using a reformatted or replacement disk drive

You can recover data from a failed disk drive when you must reformat or replace the failed disk.

Attention: Before you reformat or replace a disk drive, remove all references to non-mirrored file systems from the failing disk and remove the disk from the volume group and system configuration. If you do not, you create problems in the ODM (object data manager) and system configuration databases. Instructions for these essential steps are included in the following procedure, under Before replacing or reformatting your failed or failing disk.

The following procedure uses a scenario in which the volume group called myvg contains three disk drives that are called hdisk2, hdisk3, and hdisk4. In this scenario, hdisk3 goes bad. The non-mirrored logical volume lv01 and a copy of the mylv logical volume is contained on hdisk2. The mylv logical volume is mirrored and has three copies, each of which takes up two physical partitions on its disk. The failing hdisk3 contains another copy of mylv, and the non-mirrored logical volume called lv00. Finally, hdisk4 contains a third copy of mylv as well as a logical volume called lv02. The following figure shows this scenario.

This procedure is divided into the following key segments:
  • The things that you do to protect data before you replace or reformat your failing disk
  • The procedure that you follow to reformat or replace the disk
  • The things that you do to recover the data after the disk is reformatted or replaced

Before you replace or reformat your failed or failing disk:

  1. Log in with root authority.
  2. If you are not familiar with the logical volumes that are on the failing drive, use an operational disk to view the contents of the failing disk.
    For example, to use hdisk4 to look at hdisk3, type the following on the command line:
    lspv -M -n hdisk4 hdisk3
    The lspv command displays information about a physical volume within a volume group. The output looks similar to the following:
    hdisk3:1        mylv:1
    hdisk3:2        mylv:2
    hdisk3:3        lv00:1
    hdisk3:4-50
    The first column displays the physical partitions, and the second column displays the logical partitions. Partitions 4 through 50 are free.
  3. Back up all single-copy logical volumes on the failing device, if possible. For instructions, see Backing up user files or file systems.
  4. If you have single-copy file systems, unmount them from the disk.
    (You can identify single-copy file systems from the output of the lspv command. Single-copy file systems have the same number of logical partitions as physical partitions on the output.) Mirrored file systems do not have to be unmounted.
    In the scenario, lv00 on the failing disk hdisk3 is a single-copy file system. To unmount it, type the following:
    unmount /dev/lv00
    If you do not know the name of the file system, assuming the /etc/filesystems file is not solely on the failed disk, type mount on the command line to list all mounted file systems and find the name that is associated with your logical volume. You can also use the grep command on the /etc/filesystems file to list only the file system names, if any, associated with your logical volume. For example:
    grep lv00 /etc/filesystems
    The output looks similar to the following example:
    dev             = /dev/lv00   
    log             = /dev/loglv00
    Notes:
    1. The unmount command fails if the file system you are trying to unmount is being used. The unmount command runs only if none of the file system's files are open and no user's current directory is on that device.
    2. Another name for the unmount command is umount. The names are interchangeable.
  5. Remove all single-copy file systems from the failed physical volume by typing the rmfs command:
    rmfs /FSname
  6. Remove all mirrored logical volumes on the failing disk.
    Note: You cannot use rmlvcopy on the hd5 and hd7 logical volumes from physical volumes in the rootvg volume group. The system does not allow you to remove these logical volumes because there is only one copy of these.
    The rmlvcopy command removes copies from each logical partition. For example, type:
    rmlvcopy mylv 2 hdisk3
    By removing the copy on hdisk3, you reduce the number of copies of each logical partition belonging to the mylv logical volume from three to two (one on hdisk4 and one on hdisk2).
  7. If the failing disk was part of the root volume group and contained logical volume hd7, remove the primary dump device (hd7) by typing the following on the command line:
    sysdumpdev -P -p /dev/sysdumpnull
    The sysdumpdev command changes the primary or secondary dump device location for a running system. When you reboot, the dump device returns to its original location.
    Note: You can choose to dump to a DVD device. For more information on how to configure a DVD to be the dump device, see sysdumpdev.
  8. Remove any paging space on the disk by using the following command:
    rmps PSname
    Where PSname is the name of the paging space to be removed, which is actually the name of the logical volume on which the paging space resides.

    If the rmps command is not successful, you must use the smit chps fast path to deactivate the primary paging space and reboot before you continue with this procedure. The reducevg command in step 10 can fail if there are active paging spaces.

  9. Remove any other logical volumes from the volume group, such as the logical volumes that do not contain a file system, by using the rmlv command.
    For example, type:
    rmlv -f lv00
  10. Remove the failed disk from the volume group by using the reducevg command.
    For example, type:
    reducevg -df myvg hdisk3
    If you cannot run the reducevg command or if the command is unsuccessful, the procedure in step 13 can help clean up the VGDA/ODM information after you reformat or replace the drive.

    Replacing or reformatting your failed or failing disk:

  11. The next step depends on whether you want to reformat or replace your disk and on what type of hardware you are using:
    • If you want to reformat the disk drive, use the following procedure:
      1. With root authority, type the following SMIT fast path on the command line:
        smit diag
      2. Select Current Shell Diagnostics to enter the AIX® Diagnostics tool.
      3. After you read the Diagnostics Operating Instructions screen, press Enter.
      4. Select Task Selection.
      5. Scroll down through the task list to find and select Format Media.
      6. Select the disk that you want to reformat. After you confirm that you want to reformat the disk, all content on the disk will be erased.
      After the disk is reformatted, continue with step 12.
    • If your system supports hot swap disks, use the procedure in Recovering from disk failure while the system remains available, then continue with step 13.
    • If your system does not support hot swap disks, do the following steps:
      • Power off the old drive by using the SMIT fast path smit rmvdsk. Change the KEEP definition in database field to No.
      • Contact your next level of system support to replace the disk drive.

    After you replace or reformat your failed or failing disk:

  12. Follow the instructions in Configuring a disk and Making an available disk a physical volume.
  13. If you cannot use the reducevg command on the disk from the old volume group before the disk was formatted (step 10), the following procedure can help clean up the VGDA/ODM information.
    1. If the volume group consisted of only one disk that was reformatted, type:
      exportvg VGName
      Where VGName is the name of your volume group.
    2. If the volume group consists of more than one disk, type the following on the command line:
      varyonvg VGName
      The system displays a message about a missing or unavailable disk, and the new (or reformatted) disk is listed. Note the physical volume identifier (PVID) of the new disk, which is listed in the varyonvg message. It is the 16-character string between the name of the missing disk and the label PVNOTFND. For example:
      hdisk3 00083772caa7896e PVNOTFND
      Type:
      varyonvg -f VGName
      The missing disk is now displayed with the PVREMOVED label. For example:
      hdisk3 00083772caa7896e PVREMOVED
      Then, type the command:
      reducevg -df VGName PVID
      Where PVID is the physical volume identifier (in this scenario, 00083772caa7896e).
  14. To add the new disk drive to the volume group, use the extendvg command.
    For example, type:
    extendvg myvg hdisk3
  15. To re-create the single-copy logical volumes on the new (or reformatted) disk drive, use the mklv command.
    For example, type:
    mklv -y lv00 myvg 1 hdisk3
    This example re-creates the lv00 logical volume on the hdisk3 drive. The 1 means that this logical volume is not mirrored.
  16. To re-create the file systems on the logical volume, use the crfs command.
    For example, type:
    crfs -v jfs -d lv00 -m /dev/lv00
  17. To restore single-copy file system data from backup media, see Backing up user files or file systems.
  18. To re-create the mirrored copies of logical volumes, use the mklvcopy command.
    For example, type:
    mklvcopy mylv 3 hdisk3
    This example creates a mirrored third partition of the mylv logical volume on hdisk3.
  19. To synchronize the new mirror with the data on the other mirrors (in this example, hdisk2 and hdisk4), use the syncvg command.
    For example, type:
    syncvg -p hdisk3
As a result, all mirrored file systems must be restored and up-to-date. If you were able to back up your single-copy file systems, they are also ready to use. You must be able to proceed with normal system use.