SIP3152

Use this procedure to resolve possible failed connection problems.

For more information about failing part numbers, location codes, or removal and replacement procedures, see Part locations and location codes. Select your machine type and model number to see applicable procedures for your system.

This procedure is used to resolve the following problems:

  • Multipath redundancy level is worse (SRC xxxx4060)
  • Device bus fabric error (SRC xxxx4100)
  • Temporary device bus fabric error (SRC xxxx4101)
The possible causes are:
  • A failed connection caused by a failing component in the serial-attached SCSI (SAS) fabric between, and including, the adapter and device enclosure.
  • A failed connection caused by a failing component within the device enclosure, including the device itself.
Note: For SRC xxxx4060, the failed connection was previously working, and might have already recovered.
Considerations:
  • Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as appropriate, to prevent hardware damage.
  • Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a SAS-related problem because the SAS interface logic is on the system board.
  • Some systems have the disk enclosure or removable media enclosure integrated in the system with no cables. For these configurations the SAS connections are integrated onto the system boards and a failed connection can be the result of a failed system board or integrated device enclosure.
  • Some systems have SAS RAID adapters integrated onto the system backplane and use a cache RAID and dual IOA enablement card to enable the storage-adapter write cache and dual-storage I/O adapter (IOA) mode. For these configurations, replacement of the cache RAID and dual IOA enablement card is unlikely to solve a SAS-related problem because the SAS interface logic is on the system backplane.
  • Some configurations involve a SAS adapter connecting to internal SAS disk enclosures within a system that uses a cable card. When the procedure refers to a device enclosure, it could be referring to the internal SAS disk slots or media slots. Also, when the procedure refers to a cable, it could include a cable card.
  • When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this procedure are against the primary adapter (not the secondary adapter).
Attention:
  • When SAS fabric problems exist, do not replace RAID adapters without assistance from your service provider. Because the adapter might contain nonvolatile, write-cache data and configuration data for the attached disk arrays, additional problems can be created by replacing an adapter.
  • Follow appropriate service procedures when replacing the cache RAID and dual IOA enablement card. Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
  • Do not remove functioning disk units in a disk array without assistance from your service provider. A disk array might become unprotected or might fail if functioning disk units are removed. The removal of functioning disk units might also result in additional problems in the disk array.
  1. Determine the resource name of the adapter that reported the problem by performing the following steps:
    1. Access SST or DST.
    2. Access the product activity log and record the resource name that this error is logged against. If the resource name is an adapter resource name, use it and continue with the next step. If the resource name is a disk-unit resource name, use the Hardware Service Manager to determine the resource name of the adapter that is controlling this disk unit. The logical bus number of the disk-unit logical resource might be useful in determining the adapter resource name.
  2. Determine whether a problem still exists for the DCxx adapter resource that logged this error by examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices appear in the list and are all paths marked as Operational?
    • No: Continue with the next step.
    • Yes: The error condition has been recovered. If the error condition has been recovered more than once, go to step 5. Otherwise, the error condition is not a persistent problem and no further service action is necessary. This ends the procedure.
  3. Perform the following steps to cause the adapter to rediscover the devices and connections:
    Note: Performing this step causes the system partition to temporarily hang. Wait until the system bypasses the temporary hang.
    1. Use the logical resources IO debug option in Hardware Service Manager to perform another IPL of the virtual I/O processor that is associated with this adapter.
    2. Vary on any other resources that are attached to the virtual I/O processor.
  4. To determine whether the problem still exists for the adapter that logged this error, examine the SAS connections by performing the actions in step 2 again. Do all expected devices appear in the list and are all paths marked as Operational?
    • No: Continue with the next step.
    • Yes: The error condition no longer exists. This ends the procedure.
  5. Perform only one of the following corrective actions (listed in the order of preference). If one of the corrective actions has previously been attempted, proceed to the next one in the list.
    • Reseat cables, if present, on the adapter , device enclosure, and any additional device enclosures connected to the device enclosure. Perform the following steps:
      1. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Reseat the cables.
      3. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the cable, if present, from the adapter to device enclosure, and any cables between the device enclosure and additional device enclosures connected to the device enclosure. Perform the following steps:
      1. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance to power off the adapter slot, or power off the system or partition.
      2. Replace the cables.
      3. Using Hardware Service Manager packaging resources, perform adapter concurrent maintenance to power on the adapter slot, or power on the system or partition.
    • Replace the device.
      Note: If there are multiple devices with a path that is not Operational, the problem is not likely to be with a device.
    • Replace the internal device enclosure or see the service documentation for an external expansion unit. Perform the following steps:
      1. Power off the system or partition. If the enclosure is external, adapter concurrent maintenance can be used instead to power off the adapter slot.
      2. Replace the device-enclosure failing items. See SASEXP and DEVBPLN for possible failing items to replace.
      3. Power on the system or partition. If the enclosure is external, adapter concurrent maintenance can be used instead to power on the adapter slot.
    • Replace the adapter. For the procedure to replace the adapter, see PCI adapter.
    • Contact your service provider.
  6. To determine if the problem still exists for the adapter that logged this error, examine the SAS connections by performing the actions in step 2 again. Do all expected devices appear in the list and are all paths marked as Operational?
    • No: Go to step 5.
    • Yes: The error condition has been recovered. If the error condition has been recovered more than once, go to step 5. Otherwise, the error condition is not a persistent problem and no further service action is necessary. This ends the procedure.



Last updated: Wed, June 19, 2019