subscribe iconSubscribe to this information
POWER7 information

SIP3150

Use this procedure to perform serial attached SCSI (SAS) fabric problem isolation.

Considerations:
Attention: When SAS fabric problems exist, obtain assistance from your hardware service provider:
  • When SAS fabric problems exist, do not replace RAID adapters without assistance from your service provider. Because the adapter might contain nonvolatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing an adapter.
  • Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card. Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
  • Do not remove functioning disk units in a disk array without assistance from your service provider. A disk array might become unprotected or might fail if functioning disk units are removed. The removal of functioning disk units might also result in additional problems in the disk array.
  1. Was the SRC xxxx3020 or SRC xxxx8130?
    No:
    Go to step 3.
    Yes:
    Go to step 2.
  2. Determine which of the following is the cause of your specific error and take the appropriate actions listed.
    The possible causes for SRC xxxx3020 are:
    • More devices are connected to the adapter than the adapter supports. Change the configuration to reduce the number of devices below what is supported by the adapter.
    • A SAS device has been incorrectly moved from one location to another. Either return the device to its original location or move the device while the adapter is powered off.
    • A SAS device has been incorrectly replaced by a SATA device. A SAS device must be used to replace a SAS device.
    The possible causes for SRC xxxx8130 are:
    • One or more SAS devices were moved from a PCIe2 or PCIe3 adapter to a PCI-X or PCIe adapter. If the device was moved from a PCIe2 or PCIe3 adapter to a PCI-X or PCIe adapter, the Detail Data section of the hardware error log contains a reason for failure of Payload CRC Error. For this case, the error can be ignored and the problem is resolved if the devices are moved back to a PCIe2 or PCIe3 adapter or if the devices are formatted on the PCI-X or PCIe adapter.
    • For all other causes, go to step 3.
  3. Determine the status of the disk units in the array by doing the following steps:
    1. Access the product activity log and display the SRC that sent you here.
    2. Press the F9 key for address information. This is the adapter address.
    3. Return to the SST or DST main menu.
    4. Select Work with disk units > Display disk configuration > Display disk configuration status.
    5. On the Display disk configuration status screen, look for the devices attached to the adapter that was identified.
    Is there a device that has a status of RAID 5/Unknown, RAID 6/Unknown, RAID 5/Failed, or RAID 6/Failed?
    No:
    Go to step 5.
    Yes:
    Go to step 4
  4. Other errors should have occurred related to the disk array having degraded protection. Take action on these errors to replace the failed disk unit and restore the disk array to a fully protected state. This ends the procedure.
  5. Have other errors occurred at the same time as this error?
    No:
    Go to step 7.
    Yes:
    Go to step 6
  6. Take action on the other errors that have occurred at the same time as this error. This ends the procedure.
  7. Was the SRC xxxxFFFE?
    No:
    Go to step 10.
    Yes:
    Go to step 8.
  8. Check for the latest PTFs for the device, device enclosure, and adapter and apply them. Did you find and apply a PTF?
    No:
    Go to step 10.
    Yes:
    Go to step 9.
  9. This ends the procedure.
  10. Identify the adapter and adapter port associated with the problem by examining the product activity log. Perform the following steps:
    1. Access SST or DST.
    2. Access the product activity log and display the SRC that sent you here. Record the adapter address and the adapter port by doing one of the following:
      • If the SRC is xxxxFFFE, press the F9 key for address information. The adapter address is the bus, board, card information. The port is shown in the I/O bus field. Convert the port value from decimal to hexadecimal.
      • Press the F9 key for address information. The adapter address is the bus, board, card information. Then, press F12 to cancel and return to the previous screen. Then press the F4 key to view the additional information, if available. The adapter port is characters 1 and 2 of the unit address. For example, if the unit address is 123456FF, the port would be 12.
      • Go to Hexadecimal product activity log data to obtain the address information. The adapter address is the bus, board, card information. The adapter port is characters 1 and 2 of the unit address. For example, if the unit address is 123456FF, the port would be 12.
  11. Perform the following steps:
    1. Select Start a Service Tool > Hardware Service Manager > Logical Hardware Resources > System Bus Resources.
    2. Enter the adapter bus address and use the Associated packaging resource(s) option to display the type, model, and unit ID.
    3. Record the type, model, and unit ID of the enclosure in which the adapter is located.
    4. Use the type, model, unit ID and adapter address to find the location of the adapter (see Addresses to find the location and then go to System FRU locations).
    5. The logical port number was identified in step 10. Logical port numbers are indicated on the physical connector labels located on the tailstock of the adapter. To locate the device or device enclosure that is experiencing the problem, use the logical port number to determine the physical connector to which the device or device enclosure is attached.
  12. Because the problem persists, some corrective action is needed to resolve the problem. Proceed by doing the following:

    Perform only one of the following corrective actions (listed in the order of preference). If one of the corrective actions has previously been attempted, proceed to the next one in the list.

    • Reseat cables, if present, on adapter and device enclosure. Perform the following steps:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Reseat the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the cable, if present, from the adapter to the device enclosure. Perform the following steps:
      1. Use adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Replace the cables.
      3. Use adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the device.
      Note: If there are multiple devices with a path that is not Operational, the problem is not likely to be with a device.
    • Replace the internal device enclosure or see the service documentation for an external expansion unit. Perform the following steps:
      1. Power off the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power off the adapter slot.
      2. Replace the device enclosure.
      3. Power on the system or partition. If the enclosure is external, use adapter concurrent maintenance instead to power on the adapter slot.
    • Replace the adapter. The procedure to replace the adapter can be found in PCI adapter.
    • Contact your service provider.
  13. Does the problem still occur after performing the corrective action?
    • No: This ends the procedure.
    • Yes: Go to step 12.


Send feedback Rate this page

Last updated: Thu, July 23, 2015