Fault isolation methodology

The basic methodology used to locate faults within a storage system, and to identify the pertinent CRUs affected.

Overview

Basic steps:
  • Gather fault information, including using system LEDs.
  • Determine where in the system the fault is occurring.
  • Review logs from the ClevOS Manager event console.
  • If required, isolate the fault to a data path component or configuration as described in Isolate the fault.

Gather fault information

When a fault occurs, it is important to gather as much information as possible. Doing so will help you determine the correct action needed to remedy the fault.

Begin by reviewing the reported fault:
  • Is the fault related to an internal data path or an external data path?
  • Is the fault related to a hardware component such as a disk drive module, controller module, or power supply unit?

By isolating the fault to one of the components within the storage system, you will be able to determine the necessary corrective action more quickly.

Determine where the fault is occurring

When a fault occurs, the Module Fault LED - located in the lower left corner of the enclosure front panel - illuminates. See also Front panel LEDs. Check the status of the other front panel LEDs. Also check the LEDs on the back and top panels (must remove a lid) of the enclosure to narrow the fault to a CRU, connection, or both.

The LEDs help you identify the location of a CRU reporting a fault.

Isolate the fault

Occasionally, it might become necessary to isolate a fault. This is particularly true with data paths, due to the number of components comprising the data path. For example, if a host-side data error occurs, it could be caused by any of the components in the data path: Controller node HBA, Cable, IOM, or Disk Enclosure.

If the enclosure does not initialize

It may take up to two minutes for all enclosures to initialize. If an enclosure does not initialize:
  • Power cycle the system.
  • Make sure the power cord is properly connected, and check the power source to which it is connected.
  • Check the ClevOS Manager event console for errors.