SIP4052

Use this procedure to resolve possible failed connection problems

This procedure is used to resolve the following problems:

Multipath redundancy level is worse (SRC xxxx4060)
Device bus fabric error (SRC xxxx4100)
Temporary device bus fabric error (SRC xxxx4101)

The possible causes are:

A failed connection caused by a failing component in the serial attached SCSI (SAS) fabric between, and including, the adapter and device enclosure.
A failed connection caused by a failing component within the device enclosure, including the device itself.

Note: For SRC xxxx4060, the failed connection was previously working, and might have already recovered.

Considerations:

Power off the system, partition, or card slot before connecting and disconnecting cables or devices, as appropriate, to prevent hardware damage.
Some systems have SAS, PCI-X, and PCIe bus interface logic integrated onto the system boards and use a pluggable RAID enablement card (a non-PCI form factor card) for these SAS, PCI-X, and PCIe buses. For these configurations, replacement of the RAID enablement card is unlikely to solve a SAS-related problem because the SAS interface logic is on the system board.
Some systems have the disk enclosure or removable media enclosure integrated in the system with no cables. For these configurations, the SAS connections are integrated onto the system boards. A failed connection can be the result of a failed system board or integrated device enclosure.
When using SAS adapters in a dual storage IOA configuration, ensure that the actions taken in this procedure are against the primary adapter (not the secondary adapter).

Attention:

When SAS fabric problems exist, do not replace RAID adapters without assistance from your service provider. Because the adapter might contain nonvolatile write cache data and configuration data for the attached disk arrays, additional problems can be created by replacing an adapter.
Follow appropriate service procedures when replacing the Cache RAID and dual IOA enablement card. Incorrect removal can result in data loss or a nondual storage IOA mode of operation.
Do not remove functioning disk units in a disk array without assistance from your service provider. A disk array might become unprotected or might fail if functioning disk units are removed. The removal of functioning disk units might also result in additional problems in the disk array.

Determine the resource name of the adapter that reported the problem by performing the following:
1. Access SST or DST.
2. Access the product activity log and record the resource name that this error is logged against. If the resource name is an adapter resource name, use it and continue with the next step. If the resource name is a disk unit resource name, use the Hardware Service Manager to determine the resource name of the adapter that is controlling this disk unit.
Determine whether a problem still exists for the DCxx adapter resource that logged this error by examining the SAS connections. See Viewing SAS fabric path information. Do all expected devices appear in the list and are all paths marked as Operational?
- No: Continue with the next step.
- Yes: The error condition has been recovered. If the error condition has been recovered more than one time, go to step 5. Otherwise, the error condition is not a persistent problem and no further service action is necessary. This ends the procedure.
Perform the following steps to cause the adapter to rediscover the devices and connections:
Note: Performing this step causes the system partition to temporarily hang. Wait until the system bypasses the temporary hang.
1. Use the logical resources IO debug option in Hardware Service Manager to perform another IPL of the virtual I/O processor that is associated with this adapter.
2. Vary on any other resources attached to the virtual I/O processor.
To determine if the problem still exists for the adapter that logged this error, examine the SAS connections by performing the actions in step 2 again. Do all expected devices appear in the list and are all paths marked as Operational?
- No: Continue with the next step.
- Yes: The error condition no longer exists. This ends the procedure.
Go to SAS fabric identification. Then continue with the next step.
To determine if the problem still exists for the adapter that logged this error, examine the SAS connections by performing the actions in step 2 again. Do all expected devices appear in the list and are all paths marked as Operational?
- No: Go to step 5.
- Yes: The error condition has been recovered. If the error condition has been recovered more than one time, go to step 5. Otherwise, the error condition is not a persistent problem and no further service action is necessary. This ends the procedure.

Last updated: Wed, June 19, 2019