Controller functions

Consider these factors when using multi-initiator and HA functions.

Use of the multi-initiator and HA functions require controller and Linux® software support. Controller support is shown in the Feature comparison of SAS RAID cards table. Look for controllers that have HA two-system RAID, HA two-system JBOD, or HA single-system RAID marked as Yes for the configuration that you want. The Linux software levels required for multi-initiator support are identified in Versions of iprconfig for SAS functions.

Specific controllers are intended only to be used in either an HA two-system RAID or HA single-system RAID configuration. Use the Feature comparison of SAS RAID cards table to look for controllers that have Requires HA RAID configuration marked as Yes. This type of controller may not be used in an HA two-system JBOD or a stand-alone configuration.

Controllers connected in a RAID configuration must have the same write cache size (given they support write cache). A configuration error will be logged if the controllers' write caches are not the same size.

When reconfiguring a controller previously configured in a different HA configuration, it is recommended to configure the High-Availability Mode of the controllers to RAID or JBOD before attaching the SAS cables.

For all HA RAID configurations, one controller functions as the primary controller and manages the physical devices, such as creating a disk array or downloading disk microcode. The other controller functions as the secondary controller and is not capable of physical device management.
Note: On two-system configurations, the usage of the disk array may need to be discontinued from the secondary controller before some actions can be performed from the primary controller.

If the secondary controller detects the primary controller going offline, it will switch roles to become the primary controller. When the original primary controller comes back online, it will become the secondary controller. The exception to this case is if the original primary controller was previously designated as the “preferred” primary controller.

Both controllers are capable of performing direct I/O accesses to the disk arrays for purposes of read and write operations, but at any given time only one controller in the pair is “optimized” for the disk array. The controller that is optimized for a disk array is the one that directly accesses the physical devices for I/O operations. The controller that is non-optimized for a disk array will forward read and write requests through the SAS fabric to the optimized controller. See HA asymmetric access optimization for more information on setting and viewing disk array optimization.

The primary controller logs most errors related to problems with a disk array. Some disk array errors may also be logged on the secondary if a disk array is optimized on the secondary, at the time the error occurred.

Typical reasons for the primary and secondary controllers to switch roles from what was expected or preferred are as follows:
  • Controllers will switch roles for asymmetric reasons. For example, one controller detects more disk drives than the other. If the secondary controller is able to find devices that are not found by the primary controller, an automatic transition (failover) occurs. The controllers will communicate with each other, compare device information, and switch roles.
  • Powering off the primary controller or the system that contains the primary controller causes an automatic transition (failover) to occur.
  • Failure of primary controller or the system that contains the primary controller causes an automatic transition (failover) to occur.
  • If the preferred primary controller is delayed in becoming active, the other controller assumes the role of primary controller. After the preferred primary controller becomes active, an automatic transition (failover) occurs.
  • If the primary controller loses contact with the disks that are also accessible by the secondary controller, an automatic transition (failover) occurs.
  • Downloading controller microcode might cause an automatic transition (failover) to occur. Such a transition is because the controller would reset itself to activate the new microcode. The controller will be temporarily offline until the reset is complete. Failover to another controller can prevent disk access disruption.

Users and their applications are responsible to ensure orderly read and write operations to the shared disks or disk arrays, for example, by using device reservation commands (persistent reservation is not supported).