Fast I/O Failure for Fibre Channel devices

AIX® supports Fast I/O Failure for Fibre Channel (FC) devices after link events in a switched environment.

If the FC adapter driver detects a link event, such as a lost link between a storage device and a switch, the FC adapter driver waits a short period, approximately 15 seconds, so that the fabric can stabilize. At that point, if the FC adapter driver detects that the device is not on the fabric, it begins failing all I/Os at the adapter driver. Any new I/O or future retries of the failed I/Os are failed immediately by the adapter until the adapter driver detects that the device rejoined the fabric.

Fast Failure of I/O is controlled by a new fscsi device attribute, fc_err_recov. The default setting for this attribute is delayed_fail, which is the I/O failure behavior that is seen in previous versions of AIX. To enable Fast I/O Failure, set this attribute to fast_fail, as shown in the example:
chdev -l fscsi0 -a fc_err_recov=fast_fail  
In this example, the fscsi device instance is fscsi0. Fast fail logic is called when the adapter driver receives an indication from the switch that there is a link event with a remote storage device port by way of a Registered State Change Notification (RSCN) from the switch.

Fast I/O Failure is useful in situations where multipathing software is used. Setting the fc_err_recov attribute to fast_fail can decrease the I/O fail times because of link loss between the storage device and switch. This would support faster failover to alternate paths.

In single-path configurations, especially configurations with a single path to a paging device, the delayed_fail default setting is recommended.

Fast I/O Failure requires the following:
  • A switched environment. It is not supported in arbitrated loop environments, including public loop.
  • FC 6227 adapter firmware, level 3.22A 1 or higher.
  • FC 6228 adapter firmware, level 3.82A 1 or higher.
  • FC 6239 adapter firmware, all levels.
  • All subsequent FC adapter releases support Fast I/O Failure.

If any of these requirements are not met, the fscsi device logs an error log of type INFO indicating that one of these requirements is not met and that Fast I/O Failure is not enabled.

Some FC devices support enablement and disablement of Fast I/O Failure while the device is in the Available state. To verify whether a device supports the dynamic tracking function, use the lsattr command. The Fast I/O Failure can be changed for the supporting devices without unconfiguration and reconfiguration of the device or cycling the link. The changes must be requested when the storage area network (SAN) fabric is stable. A request fails if the error recovery is active in SAN during the time of the request.