Setting defective channel paths offline automatically

Control the removal of a defective channel path through the path_threshold and path_interval sysfs attributes. If a channel path does not work correctly, it is removed from normal operation if other channel paths are available.

About this task

A channel control check (CCC) is caused by any machine malfunction that affects channel-subsystem controls. An interface control check (IFCC) indicates that an incorrect signal occurred on the channel path. Usually, these errors can be recovered automatically.

However, if IFCC or CCC errors occur frequently on a particular channel path, these errors indicate a failure of this channel path. Such a failure leads to performance degradation due to error recovery processing. If other channel paths are available, it might help the overall device performance to exclude the malfunctioning channel path from I/O.

The channel-path error recovery feature applies to devices for which multiple channel paths are operational. By default, the error threshold is 256 and the reset interval is 300 s (5 minutes). Accordingly, a channel path is set offline when the error count has reached 256. If 300 seconds elapse without an error the error count is reset to 0.

You can set different values through the path_threshold and path_interval sysfs attributes of the device.

Procedure

To exclude a channel path from I/O after a certain number of IFCC or CCC errors within a certain time frame, specify both path_threshold and path_interval.
  • To specify the number of errors that must occur before the channel path is taken offline, issue a command of this form:
    # echo <no_of_errors> > /sys/bus/ccw/devices/<device_bus_id>/path_threshold

    where /sys/bus/ccw/devices/<device_bus_id> represents the device in sysfs and <no_of_errors> is an integer that specifies the error threshold.

    To disable detecting defective paths, and to suppress messages about IFCC or CCC errors, set <no_of_errors> to 0.

  • To specify the time that must elapse without errors for the counter to reset, issue a command of this form:
    # echo <time> > /sys/bus/ccw/devices/<device_bus_id>/path_interval
    where <time> is the reset interval in seconds.

Examples

Setting 512 for the error threshold and 6 minutes (360 s) for the reset interval:
echo 512 > /sys/bus/ccw/devices/0.0.4711/path_threshold
echo 360 > /sys/bus/ccw/devices/0.0.4711/path_interval
According to this example, a channel path is automatically removed if a count of 512 IFCCs or CCCs is reached. Any 6-minute interval without a IFCCs or CCCs causes the counter to be reset to zero.

What to do next

After you repair the faulty channel path, set it online again by using the tunedasd command with the -p option. See tunedasd - Adjust low-level DASD settings for details.