SIP3150

Use this procedure to complete serial attached SCSI (SAS) fabric problem isolation.

Before you begin

Considerations:
  • Power off the system, partition, or card slot before you connect and disconnect cables or devices, as appropriate, to prevent hardware damage.
  • Some systems have a disk enclosure or removable media enclosure that is integrated in the system with no cables. For these configurations, the SAS connections are integrated onto the system boards and a failed connection can be the result of a failed system board or integrated device enclosure.
  • Some systems have SAS RAID adapters that are integrated onto the system backplane and use a cache RAID and dual IOA enablement card to enable storage adapter write cache and dual storage I/O adapter (IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
  • When SAS fabric problems exist, do not replace RAID adapters without assistance from your service provider. Because the adapter might contain nonvolatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing an adapter.
  • Follow appropriate service procedures when you replace the Cache RAID and dual IOA enablement card. Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
  • Do not remove functioning disk units in a disk array without assistance from your service provider. A disk array might become unprotected or might fail if functioning disk units are removed. The removal of functioning disk units might also result in additional problems in the disk array.

Procedure

  1. Was the SRC xxxx3020 or SRC xxxx8130?
    No:
    Go to step 3.
    Yes:
    Go to step 2.
  2. Determine which of the following problems is the cause of your specific error and take the appropriate actions listed.
    The possible causes for SRC xxxx3020 are:
    • More devices are connected to the adapter than the adapter supports. Change the configuration to reduce the number of devices below what is supported by the adapter.
    • A SAS device was incorrectly moved from one location to another. Either return the device to its original location or move the device while the adapter is powered off.
    • A SAS device was incorrectly replaced by a SATA device. A SAS device must be used to replace a SAS device.
    The possible causes for SRC xxxx8130 are:
    • One or more SAS devices were moved from a PCIe3 adapter to a PCIe adapter. If the device was moved from a PCIe3 adapter to a PCIe adapter, the Detail Data section of the hardware error log contains a reason for failure of Payload CRC Error. For this case, the error can be ignored and the problem is resolved if the devices are moved back to a PCIe3 adapter or if the devices are formatted on the PCIe adapter.
    • For all other causes, go to step 3.
  3. Determine the status of the disk units in the array by doing the following steps:
    1. Access the product activity log and display the SRC that sent you here.
    2. Press the F9 key for address information. This is the adapter address.
    3. Return to the SST or DST main menu.
    4. Select Work with disk units > Display disk configuration > Display disk configuration status.
    5. On the Display disk configuration status screen, look for the devices that are attached to the adapter that was identified.
    Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID 6/Failed?
    No:
    Go to step 5.
    Yes:
    Go to step 4.
  4. Other errors might have occurred that are related to the disk array having degraded protection. Take action on these errors to replace the failed disk unit and restore the disk array to a fully protected state. This ends the procedure.
  5. Have other errors occurred at the same time as this error?
    No:
    Go to step 7.
    Yes:
    Go to step 6.
  6. Take action on the other errors that occurred at the same time as this error. This ends the procedure.
  7. Was the SRC xxxxFFFE?
    No:
    Go to step 10.
    Yes:
    Go to step 8.
  8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. If you need assistance finding PTFs, contact your next level of support. Did you find and apply a PTF?
    No:
    Go to step 10.
    Yes:
    Go to step 9.
  9. This ends the procedure.
  10. Identify the adapter and adapter port that is associated with the problem by examining the product activity log. Perform the following steps:
    1. Access SST or DST.
    2. Access the product activity log and display the SRC that sent you here. Record the adapter address and the adapter port by completing one of the following actions:
      • If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the bus information. The port is shown in the I/O bus field. Convert the port value from decimal to hexadecimal.
      • Press the F9 key for address information. The adapter address is the bus information. Then, press F12 to cancel and return to the previous screen. Then, press the F4 key to view the additional information, if available. This information is the unit address. Go to SAS address and physical location information and use the unit address to determine the controller port.
      • Go to Hexadecimal product activity log data to obtain the address information. The adapter address is the bus information. The controller port is contained in the unit address. Go to SAS address and physical location information and use the unit address to determine the controller port.
  11. Perform the following steps:
    1. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources > System Bus Resources.
    2. Enter the adapter bus address and use the Associated packaging resource(s) option to display the type, model, and unit ID.
    3. Record the type, model, and unit ID of the enclosure in which the adapter is located.
    4. Use the type, model, unit ID and adapter address to find the location of the adapter (see Addresses to find the location and then go to Part locations and location codes).
    5. The logical port number was identified in step 10. Logical port numbers are indicated on the physical connector labels that are located on the tailstock of the adapter. To locate the device or device enclosure that is experiencing the problem, use the logical port number to determine the physical connector to which the device or device enclosure is attached.
    Note: For more information about unit address formats, see SAS address and physical location information.
  12. Because the problem persists, some corrective action is needed to resolve the problem.

    Perform only one of the following corrective actions (listed in the order of preference). If one of the corrective actions was previously attempted, proceed to the next one in the list.

    • Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Reseat the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the cable, if present, from the adapter to the device enclosure. Perform the following steps:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Replace the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the device.
      Note: If there are multiple devices with a path that is not Operational, the problem is not likely to be with a device.
    • Replace the internal device enclosure or see the service documentation for an external expansion unit. Perform the following steps:
      1. Power off the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power off the adapter slot.
      2. Replace the device enclosure.
      3. Power on the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power on the adapter slot.
    • Replace the adapter. The procedure to replace the adapter can be found in Adapters.
    • Contact your service provider.
  13. Does the problem still occur after you completed the corrective action?
    • No: This ends the procedure.
    • Yes: Go to step 12.