IBM Support

Drive maintenance - Servers

Troubleshooting


Problem

Drive maintenance consists of the following topics: -Obtaining drive status information -Reading results of a hard disk drive failure -Replacing defunct drives -Redefining the space in an array by replacing logical drives Obtaining drive

Resolving The Problem

Drive maintenance consists of the following topics:

-Obtaining drive status information
-Reading results of a hard disk drive failure
-Replacing defunct drives
-Redefining the space in an array by replacing logical drives

Obtaining drive status
To see the ID, capacity, and other information about each of the hard disk drives attached to the RAID adapter, perform the following steps:

  1. Start the RAID configuration program by inserting the IBM RAID option diskette into the primary diskette drive and turning on the system. If the system already is turned on, press Ctrl+ Alt+ Del.
  2. Select Start RAID Configuration Program from the PC DOS start-up menu; then press Enter.
  3. Select Drive information from the Main Menu. You will see the following information:

drive information

  1. Use the Up Arrow key or the Down Arrow key to highlight each of the drives shown in the Bay/Array selection list. As a drive is highlighted, the information for that drive appears at the bottom of the screen.

Press Esc to return to the Main Menu.

Note
The status of the hard disk drive determines the status of the logical drives in the array in which the hard disk is grouped.

Bay/Array Selection List
The status of the drives in the Bay/Array selection list is defined as follows:

  Status              Meaning  CDR                 CD-ROM drive installed.    DDD Defunct.         The drive is an online or hot-spare drive                       that does not respond to commands.                       (If a  RDY  drive is defunct orpowered down,                        it shows as an empty bay (a blank status),                        not a DDD  status).    FMT                 Format. The drive is being reformatted.     HSP                Hot spare. The drive will replace a similar                     drive that becomes defunct in real time. At                     that time, its status changes to ONL, and its                      array association appears.     OFL                Offline. The drive is a good drive that has                      replaced a defunct drive in a RAID level 1 or                      level 5 array. It is associated with an array,                     but does not contain any valid data. The drive                     state remains OFL during the rebuild phase.     ONL Online.        The drive is part of an array. If this drive                      fails, logical drives defined in the array in                      which this drive is grouped will have a status                     of Offline (if the logical drive is assigned                      RAID level 0 with a good status) or Critical                      (if the logical drive is assigned RAID level1                      or level 5 with a good status).     RDY               Ready. The drive is recognized by the adapter                    and is available for definition.    TAP               Tape drive installed.     UFM               Unformatted. The drive requires a low-level                    format before it can be used in an array.    

Blank Status
Any of the following circumstances can cause the status area to be blank:

- No hard disk drive is installed in that bay.
- The bay contains a hard disk drive, but the drive is not inserted correctly.
- An array was deleted and a defunct drive is still in the bay.
- A new drive was installed and the configuration program has not been restarted. (The status will change to RDY when the RAID configuration program is restarted.)



Results of a Hard Disk Drive Failure
Depending on the circumstances, there can be several possible results from a drive failure.

Scenario 1:
- Only one hard disk drive fails.
- A hot-spare drive is defined that is the same size as the failed drive.
- The logical drives in the array are assigned RAID level 1, level 5, or a combination of these two levels.

The hot spare will take over immediately.


Note
Hot-spare drive capability does not apply to logical drives assigned RAID level 0. Data for logical drives assigned RAID levels 1 and 5 is not lost, even though the drives function with reduced performance.

Scenario 2:
- Only one hard disk drive fails.
- A hot-spare drive is not defined or is a different size from the failed drive.
- The logical drives in the array are assigned RAID levels 1, 5, or a combination of these two level
s.

No data will be lost, but the system will operate at reduced performance until the defunct drive is replaced and rebuilt.

Scenario 3:
If more than one drive in an array fails, all the data is lost in all of the logical drives of the array. For this reason, it is important that you replace and rebuild a defunct drive as soon as possible.

Logical and Hard Disk Drive Status Indications
The status of the hard disk drive determines the status of the logical drives in the array in which the hard disk is grouped.

  • A single hard disk drive failure (indicated by a DDD status in the Bay/Array selection list) causes logical drives in that array that are assigned levels 1 and 5 to have a Critical status. Data remains in logical drives with a Critical status, but you must replace the one defunct hard disk drive promptly, because if two hard disk drives were to fail, all of the data in the array would be lost.

    After you install a new hard disk drive, the Replace process changes the drive status from DDD to OFL if there is a Critical logical drive. After the Rebuild process, the hard disk drive status changes from OFL to ONL.

  • A single or multiple hard disk drive failure causes logical drives in that array that are assigned level 0 to have an Offline status. Data in logical drives with an Offline status is lost. However, with a multiple disk drive failure, when the defunct drives are part of the same array, logical drives in that array will have an Offline status. This means that data is lost in all the logical drives in that array, regardless of which RAID level is assigned.

Replacing a Faulty Drive
To replace a faulty drive:

Note
The hard disk drive indicator light blinks when the drive has failed and needs to be replaced ( DDD status only).

  1. Start the RAID configuration program by inserting the IBM RAID Adapter Option Diskette into the primary diskette drive and powering-on the system. If the system already is powered-on, press Ctrl+ Alt+ Del.
  2. Select Start RAID Configuration Program from the PC DOS start-up menu. If the drive failed while the system was powered down, a screen appears the next time the system is powered on showing you which drive is defunct.
  3. If the drive is not damaged (for example, it is not inserted correctly):
    1. Power-off the system.
    2. Correct the problem.
    3. Make sure that the cables to the power supply and the SCSI-2 controller are connected correctly. Check the SCSI-2 controller and the SCSI-2 connector on the RAID Adapter.
    4. Restart the system.
  4. If the drive is defunct:
    1. Press Y (Yes) to reconfigure the system.
    2. Press Ctrl+ Alt+ Del when instructed to restart the system. The Main Menu appears.
      At this point, the drive status indicates DDD.

      Attention:
      Removing the wrong hard disk drive can cause loss of all data in the array.

    3. Replace the defunct drive.
    4. After you have replaced the drive, press Enter. The system will be reconfigured to include the drive, and the drive's status will change to OFL.
      When you see the configuration completion message, select Rebuild drive.
    5. Use the Up Arrow key or the Down Arrow key to highlight the OFL (offline) drive you want to rebuild; then press Enter. Information and status messages about each stage of the rebuilding process appears on the screen.
    6. When the rebuilding process is completed, press Esc to return to the Main Menu. The new configuration will be saved automatically.
    7. Back up the new configuration (see"Backing Up the Disk-Array Configuration").
    8. Select Exit to end the RAID configuration program.
    9. Remove the diskette and press Ctrl+ Alt+ Del to restart the system.

Advanced Functions
You can select several utility programs from the Advanced Functions menu. These include:

-Back up configuration to diskette

-Restore configuration to diskette
-Change the write policy
-Change the RAID parameters
-Format a drive

Backing Up the Disk-Array Configuration
The RAID adapter maintains a record of the disk-array configuration information in its electronically erasable programmable read-only memory (EEPROM) module. The disk-array configuration is vital information. To protect this information, back up the information to diskette as soon as you have completed the tasks. You need a blank, formatted, 3.5-inch diskette.

To back up the disk-array configuration information to diskette, perform the following steps:

  1. Label a blank diskette Disk Array Configuration Backup, and date it.
  2. Start the RAID configuration program by inserting the IBM RAID Adapter Option Diskette into the primary diskette drive and powering-on the system. If the system already is powered-on, press Ctrl+ Alt+ Del.
  3. Select Start RAID Configuration Program from the PC DOS start-up menu and press Enter.
  4. Select Advanced functions from the Main Menu.
  5. Select Backup config. to diskette.
  6. Remove the RAID Adapter Option Diskette from the drive and insert the blank diskette.
  7. Follow the instructions on the screen.

Restoring the Disk-Array Configuration
To restore the disk-array configuration information in the RAID adapter EEPROM module, use the RAID Adapter Option Diskette and an up-to-date Disk Array Configuration Backup diskette.

Note
Because dynamic changes in the configuration of the disk array occur due to hot-spare drive replacement or other drive maintenance activity, the configuration backup information on the diskette might be different from that in the adapter. It is important that you back up the disk-array configuration information frequently, to keep the backup information on the diskette current.

To restore the RAID configuration information

  1. Start the RAID configuration program by inserting the IBM RAID Adapter Option Diskette into the primary diskette drive and powering-on the system. If the system already is powered-on, press Ctrl+ Alt+ Del.
  2. Select Start RAID Configuration Program from the PC DOS start-up menu and press Enter.
  3. Select Advanced functions from the Main Menu.
  4. Select Restore config. from diskette.
  5. Follow the instructions on the screen.

Using the Advanced Functions
This section gives the procedures for using the advanced functions, such as changing the write policy, changing the RAID parameters, and formatting a drive.

Warnings appear throughout this section to alert you to potential loss of data. Read these warnings carefully before answering yes to the confirmations requested by the RAID configuration program.

Changing the Write Policy
When you configure a logical drive, the RAID adapter automatically sets the write policy to write-through (WT) mode, where the completion status is sent after the data is written to the hard disk drive. To improve performance, you can change this write policy to write-back (WB) mode, where the completion status is sent after the data is copied to cache memory, but before the data is actually written to the storage device.

Although you gain performance with write-back mode, it creates a greater risk of losing data due to a power failure. This is because the system gets a completion status message when the data reaches cache memory, but before data is actually written to the storage device.

Attention
If you change the write policy to write-back, wait at least 10 seconds after the last operation before you power-off the server. It takes that long for the system to move the data from the cache memory to the storage device. Failure to follow this practice can result in lost data.

To change the write policy, perform the following steps:

  1. Start the RAID configuration program by inserting the IBM RAID Adapter Option Diskette into the primary diskette drive and turning on the system. If the system already is turned on, press Ctrl+ Alt+ Del.
  2. Select Start RAID Configuration Program from the PC DOS start-up menu and press Enter.
  3. Select Advanced functions from the Main Menu.
  4. Select Change write policy from the Advanced Functions menu. The cursor will be active in the Logical Drive list.
  5. Select the logical drive whose write policy you want to change. A screen similar to the following appears:


    disk array configuration


    Note
    The information might be different from that shown in this screen.

    The Logical Drive list contains the logical drive ID, the size in megabytes of each logical drive, the RAID level you assigned to that logical drive, and the date you created it.

    The status of the logical drive is also shown. Good means that all is well with the drive; Critical means that you must replace the hard disk drive and rebuild the logical drive. (You will have received a message telling you what has happened to the drive.) Offline means that the logical drive is unrecoverable; the data in that drive is lost.
  6. Locate the Wrt pol (Write Policy) field in the Logical Drive list. The write policy is shown as either WT (write-through, which is the default setting) or WB (write-back).
  7. Use the Up Arrow key or the Down Arrow key to select the logical drive whose write policy you want to change.
  8. Attention
    If you change the write policy to write-back, wait at least 10 seconds after the last operation before you power off the server. It takes that long for the system to move the data from the cache memory to the storage device. Failure to follow this practice can result in lost data.

  9. Press Enter to change the write policy. Notice that WT changes to WB. You can press Enter to alternate between WT and WB.
  10. When you have made your choice, press Esc to return to the Advanced Functions menu.
  11. Select Exit. The Confirm pop-up window appears asking you to confirm your action.
  12. To return the setting to its original state, select No. To save the changes, select Yes.
  13. Back up the disk-array configuration information to diskette. Refer to "Backing Up the Disk-Array Configuration" for more information.

Formatting Drives
You can perform a low-level format on drives with RDY (ready), OFL (offline), or UNF (unformatted) status.

Note
The Format drive choice on the Advanced Functions menu provides a low-level format. If you install a new hard disk drive that requires a standard format, use the Format command provided by the operating system.

The Format program is provided in the IBM RAID configuration program so that you can perform a low-level format on a drive controlled by the RAID adapter.
To perform a low-level format:

  1. Start the RAID configuration program by inserting the IBM RAID Adapter Option Diskette into the primary diskette drive and powering-on the system. If the system already is turned on, press Ctrl+ Alt+ Del.
  2. Select Start RAID Configuration Program from the PC DOS start-up menu; the Main Menu appears.
  3. Select Advanced functions from the Main Menu.

    Note
    A low-level format erases all data and programs from the hard disk drive. Before proceeding, back up any data and programs that you want to save.
  4. Select Format drive. The low-level format program starts.
  5. Follow the instructions on the screen.

You can perform a low-level format on more than one drive at a time.

Changing the RAID Parameters
You can change the RAID parameters using the advanced functions by selecting Change RAID parameters.


disk array configuration


The default settings are:

  • Stripe unit size - 8K

    Attention
    Once the stripe unit is chosen and data is stored in the logical drives, the stripe unit cannot be changed without destroying data in the logical drives.

    The stripe unit size is the amount of data written on a given disk before writing on the next disk. To maximize the overall performance, choose the stripe unit such that the stripe-unit size is close to the size of the system I/O request. The default is set to 8K data bytes.
  • Rebuild priority - Equal Rebuild priority can be set to equal, high, or low. When the rebuild request is set to equal, the rebuild I/O request and system I/O request get equal priority in the execution order.

    When the rebuild request is set to high, the rebuild I/O request will get a higher priority than a system I/O request. In a heavily loaded system (with a high rate of system I/O requests), the high-priority rebuild can significantly reduce the disk rebuild time at the expense of degraded handling of I/O requests.

    When the rebuild priority is set to low, the rebuild I/O requests can execute only if no pending system I/O requests are pending. In a moderate to heavily loaded system, low rebuild priority will increase the disk rebuild time significantly and provide better system performance.

    Note
    Rebuild priority can be changed without affecting data in the logical drives.
  • Parity placement - RA

    Attention
    Once a parity placement scheme is chosen and data stored, it cannot be changed without destroying data.

    Parity placement defines how parity is placed in the disk array with respect to data. The following illustration shows both the Left Symmetric (LS) and Right Asymmetric (RA) parity placement in a four-drive disk array. Here AAA, BBB, and CCC are the data stripe units, and PP0 is the corresponding parity. Similarly DDD, EEE, and FFF are the data stripe units, and PP1 is the corresponding parity.

      Right Asymmetric (RA) Left Symmetric (LS)    Disk Disk Disk Disk Disk Disk Disk Disk  1     2   3   4    1   2   3   4  PP0   AAA BBB CCC  AAA BBB CCC PP   DDD   PP1 EEE FFF  EEE FFF PP1 DDD  GGG   HHH PP2 III  III PP2 GGG HHH  JJJ   KKK LLL PP3  PP3 JJJ KKK LLL  
    In some situations you may want to try LS parity placement to improve performance. The default parity placement is RA.
  • Read ahead -On

    Normally the IBM SCSI-2 Fast/Wide PCI-Bus RAID Adapter transfers data from disk to its local cache in steps of stripe-unit size. This provides excellent overall performance when workloads tend to be sequential. However, if the workload is random and system I/O requests are smaller than stripe-unit size, reading ahead to the end of the stripe unit will result in a wasted SCSI bus bandwidth and wasted disk utilization. When read-ahead is set to Off, the size of data transfer from the disk to local cache is equal to the system I/O request size, and no read-ahead to the end of the stripe unit is performed.

    Notes
    1. The read-ahead setting can be changed without destroying data in a logical drive.
    2. When the configuration is saved on a diskette, the RAID parameters are saved also.

Document Location

Worldwide

Operating System

System x:All operating systems listed

Older System x:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PCK","label":"Older System x->PC Server 720->8642"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PCL","label":"Older System x->PC Server 300->8640"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PHA","label":"Older System x->PC Server 500->8641"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PHB","label":"Older System x->PC Server 520->8641"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PIX","label":"Older System x->PC Server 320->8640"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU02PJW","label":"Older System x->PC Server 310->8639"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU90ZZV","label":"System x->Rack\/Storage Enclosures->3518"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU92DKL","label":"System x->Rack\/Storage Enclosures->3517"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
28 January 2019

UID

ibm1DDSE-44RNXL