Use this MAP to resolve the following problem: Multipath
redundancy level got worse (SRN nnnn - 4060) for
a PCI-X or PCIe controller.
The possible causes follow:
- A failed connection caused by a failing component in the SAS fabric
between, and including, the adapter and device enclosure.
- A failed connection caused by a failing component within the device
enclosure, including the device itself.
Considerations:
- Remove power from the system before connecting and disconnecting
cables or devices, as appropriate, to prevent hardware damage or erroneous
diagnostic results.
- Some systems have SAS and PCI-X or PCIe bus interface logic integrated
onto the system boards and use a pluggable RAID enablement card (a
non-PCI form factor card) for such integrated-logic buses. See the
feature comparison tables for PCIe and PCI-X cards.
For these configurations, replacement of the RAID enablement card
is unlikely to solve a SAS-related problem because the SAS interface
logic is on the system board.
- Some systems have the disk enclosure or removable media enclosure
integrated in the system with no cables. For these configurations
the SAS connections are integrated onto the system boards and a failed
connection can be the result of a failed system board or integrated
device enclosure.
- Some systems have SAS RAID adapters integrated onto the system
boards and use a Cache RAID - Dual IOA Enablement Card (for example,
FC5662) to enable storage adapter Write Cache and Dual Storage IOA
(HA RAID mode). For these configurations, replacement of the Cache
RAID - Dual IOA Enablement Card is unlikely to solve a SAS-related
problem because the SAS interface logic is on the system board. Additionally,
appropriate service procedures must be followed when replacing the
Cache RAID - Dual IOA Enablement Card since removal of this card can
cause data loss if incorrectly performed and can also result in a
non-Dual Storage IOA (non-HA) mode of operation.
- Some configurations involve a SAS adapter connecting to internal
SAS disk enclosures within a system using a FC3650 or FC3651 cable
card. Keep in mind that when the MAP refers to a device enclosure,
it could be referring to the internal SAS disk slots or media slots.
Also, when the MAP refers to a cable, it could include a FC3650 or
FC3651 cable card.
- Some adapters, known as RAID and SSD adapters, contain SSDs, which
are integrated on the adapter. See the feature comparison tables for PCIe cards.
For these configurations, FRU replacement to solve SAS-related problems
is limited to replacing either the adapter or the integrated SSDs
because the entire SAS interface logic is contained on the adapter.
- When using SAS adapters in either an HA two-system RAID or HA
single-system RAID configuration, ensure that the actions taken in
this MAP are against the Primary adapter and not the Secondary adapter.
- Before executing the system verification action in this map, reconstruct
any degraded disk arrays if possible. This will help avoid potential
data loss resulting from the adapter reset performed during system
verification action taken in this map.
Attention: When SAS fabric problems exist, obtain
assistance from your hardware service provider before performing any
of the following actions:
- Obtain assistance before you replace a RAID adapter because the
adapter might contain nonvolatile write cache data and configuration
data for the attached disk arrays, and additional problems might be
created by replacing an adapter.
- Obtain assistance before you remove functioning disks in a disk
array because the disk array might become degraded or might fail,
and additional problems might be created if functioning disks are
removed from a disk array.
Step 3153-1
Determine whether the problem
still exists for the adapter that logged this error by examining the
SAS connections as follows:
- Start the IBM® SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection display.
- Select .
- Select .
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-2.
- Yes
- Go to Step 3153-6.
Step 3153-2
Run diagnostics
in system verification mode on the adapter to rediscover the devices
and connections.
- Start Diagnostics and select Task Selection on
the Function Selection display.
- Select Run Diagnostics.
- Select the adapter resource.
- Select System Verification.
Note: Disregard any trouble found for now, and continue with
the next step.
Step 3153-3
Determine whether the problem
still exists for the adapter which logged this error by examining
the SAS connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection display.
- Select .
- Select .
- Select a device with a path that is not marked as Operational,
if one exists, to obtain additional details about the full path from
the adapter port to the device. See Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-4.
- Yes
- Go to Step 3153-6.
Step 3153-4
Since the problem
persists, some corrective action is needed to resolve the problem.
Proceed by doing the following steps:
- Power off the system or logical partition.
- Perform only one of the following corrective actions, which are
listed in the order of preference. If one of the corrective actions
has been attempted, then proceed to the next action in the list.
Note: Prior
to replacing parts, consider using a complete power-down of the entire
system, including any external device enclosures, to reset all possible
failing components. This action might correct the problem without
replacing parts.
- Power on the system or logical partition.
Note: In some situations,
it might be acceptable to unconfigure and reconfigure the adapter
instead of powering off and powering on the system or logical partition.
Step 3153-5
Determine whether the problem
still exists for the adapter that logged this error by examining the
SAS connections as follows:
- Start the IBM SAS Disk Array Manager.
- Start Diagnostics and select Task Selection on
the Function Selection display.
- Select .
- Select .
- Select a device with a path which is not marked as Operational,
if one exists, to obtain additional details about the full path from
the adapter port to the device. See Viewing SAS fabric path information for
an example of how this additional detail can be used to help isolate
where in the path the problem exists.
Do all expected devices appear in the list and are all paths
marked as Operational?
- No
- Go to Step 3153-4.
- Yes
- Go to Step 3153-6.
Step 3153-6
When the problem is resolved, see the removal and replacement
procedures topic for the system unit on which you are working and
do the "Verifying the repair" procedure.