1627 The cluster has insufficient redundancy in its controller connectivity.
Explanation
This error can occur when a clustered system does not have sufficient redundancy in its connections to the disk controllers. In this case, another failure in the SAN could result in loss of access to the application data. Typically a clustered system in a SAN environment has redundant connections to every disk controller. This redundancy allows for continued operation when there is a failure in one of the SAN components.
To provide recommended redundancy, configure a clustered system so that the following statements are true:
- Each node can access each disk controller through two or more different initiator ports on the node.
- Each node can access each disk controller through two or more different controller target ports.
Note: Some disk controllers provide only a single target port.
- Each node can access each disk controller target port through at least one initiator port on the node.
If no higher-priority errors are reported, this error usually indicates a problem with the SAN design, a problem with the SAN zoning or a problem with the disk controller.
If there are unfixed higher-priority errors that relate to the SAN or to disk controllers, fix those errors before you resolve this error because they might indicate the reason for the lack of redundancy. Error codes that must be fixed first are:
- 1210 Local FC port excluded
- 1230 Login has been excluded
The 1627 error code is reported for a number of different error IDs. The error ID indicates the area where there is a lack of redundancy. The data that is reported in an event log entry indicates where the condition was found.
The meaning of the error IDs is shown in the following text. For each error ID, the most likely reason for the condition is given. If the problem is not found in the suggested areas, check the configuration and state of all of the SAN components (switches, controllers, disks, cables, and clustered system) to determine where there is a single point of failure.
010040 A disk controller is only accessible from a single node port.
- A node detected that it has a connection to the disk controller only through exactly one initiator port, and more than one initiator port is operational.
- The error data indicates the device WWNN and the WWPN of the connected port.
- A zoning issue or a Fibre Channel connection hardware fault might cause this condition.
010041 A disk controller is only accessible from a single port on the controller.
- A node detected that it is only connected to exactly one target port on a disk controller, and more than one target port connection is expected.
- The error data indicates the WWPN of the disk controller port that is connected.
- A zoning issue or a Fibre Channel connection hardware fault might cause this condition.
010042 Only a single port on a disk controller is accessible from every node in the clustered system.
- Only a single port on a disk controller is accessible to every node when there are multiple ports on the controller that could be connected.
- The error data indicates the WWPN of the disk controller port that is connected.
- A zoning issue or a Fibre Channel connection hardware fault might cause this condition.
010043 A disk controller is accessible through only half, or less, of the previously configured controller ports.
- Although there might still be multiple ports that are accessible on the disk controller, it is possible that a hardware component of the controller failed or one of the SAN fabrics failed such that the operational system configuration is reduced to a single point of failure.
- The error data indicates a port on the disk controller that is still connected, and also lists controller ports that are expected but that are not connected.
- A disk controller issue, switch hardware issue, zoning issue, or cable fault might cause this condition.
010044 A disk controller is not accessible from a node.
- A node detected that it has no access to a disk controller. The controller is still accessible from the partner node in the I/O group, so its data is still accessible to the host applications.
- The error data indicates the WWPN of the missing disk controller.
- A zoning issue or a cabling error might cause this condition.
010117 A disk controller is not accessible from a node that is allowed to access the device by site policy.
- A disk controller is not accessible from a node that is allowed to access the device by site policy. If a disk controller has multiple WWNNs, the disk controller might still be accessible to the node through one of the other WWNNs.
- The error data indicates the WWNN of the inaccessible disk controller.
- A zoning issue or a fibre channel connection hardware fault might cause this condition.
User response
- Check the error ID and data for a more detailed description of the error.
- Determine whether there was an intentional change to the SAN zoning or to a disk controller configuration that reduces the clustered system's access to the indicated disk controller. If either action occurred, continue with step 8.
- Use the GUI or the CLI command lsfabric to ensure that all disk controller WWPNs are reported as expected.
- Ensure that all disk controller WWPNs are zoned for use by the clustered system.
- Check for any unfixed errors on the disk controllers.
- Ensure that all of the Fibre Channel cables are connected to the correct ports at each end.
- Check for failures in the Fibre Channel cables and connectors.
- When you resolve the issues, use the GUI or the CLI command detectmdisk to rescan the Fibre Channel network for changes to the managed disks. Do not attempt to detect managed disks unless you are sure that all problems are fixed. Detecting managed disks prematurely might mask an issue.
- Mark the error that you repaired as fixed. The clustered system revalidates the redundancy and reports another error if there is still not sufficient redundancy.
- Run the fix procedure from the event log menu in the GUI.
Possible Cause-FRUs or other:
- None