Setting defective channel paths offline automatically

Red Hat Enterprise Linux 9.2 LPAR mode z/VM guest KVM guest

Control the automatic removal of defective channel paths through the path_threshold, path_interval, and path_autodisable sysfs attributes.

About this task

A channel control check (CCC) is caused by any machine malfunction that affects channel-subsystem controls. An interface control check (IFCC) indicates that an incorrect signal occurred on the channel path. Usually, these errors can be recovered automatically.

However, if IFCC or CCC errors occur frequently on a particular channel path, these errors indicate a failure of this channel path. Error recovery processing on defective channel paths can result in performance degradation. If at least one operational channel path remains, overall device performance might improve if a defective channel path is excluded from I/O.

By default, automatic path removal is enabled with an error threshold of 256 and a reset interval of 300 s (5 minutes). Accordingly, a channel path is set offline automatically, when the error count reaches 256 and if at least one other channel path remains. If 300 seconds elapse without an error, the error count is reset to 0.

You can change the error threshold and reset interval, or you can prevent automatic removal of channel paths altogether.

Procedure

  • To specify the number of errors that must occur before the channel path is taken offline, issue a command of this form:
    # echo <no_of_errors> > /sys/bus/ccw/devices/<device_bus_id>/path_threshold

    where /sys/bus/ccw/devices/<device_bus_id> represents the device in sysfs and <no_of_errors> is an integer that specifies the error threshold.

    To disable detecting defective paths, and to suppress messages about IFCC or CCC errors, set <no_of_errors> to 0.

  • To specify the time that must elapse without errors to trigger a counter reset, issue a command of this form:
    # echo <time> > /sys/bus/ccw/devices/<device_bus_id>/path_interval
    where <time> is the reset interval in seconds.
  • To prevent defective paths from being set offline automatically, issue a command of this form:
    # echo <flag> > /sys/bus/ccw/devices/<device_bus_id>/path_autodisable
    where <flag> can be 1 to enable automatic path removal, or 0 to prevent automatic path removal. By default, automatic path removal is enabled.

Examples

  • Setting 512 for the error threshold and 6 minutes (360 s) for the reset interval:
    echo 512 > /sys/bus/ccw/devices/0.0.4711/path_threshold
    echo 360 > /sys/bus/ccw/devices/0.0.4711/path_interval
    According to this example, a channel path is automatically removed if a count of 512 IFCCs or CCCs is reached. Any 6-minute interval without a IFCCs or CCCs causes the counter to be reset to zero.
  • Preventing automatic removal of defective channel paths:
    # echo 0 > /sys/bus/ccw/devices/0.0.4711/path_autodisable
    In this example, messages about defective paths are issued according to the settings for the error threshold and the reset interval, but defective paths are not removed automatically.

What to do next

After you repair the faulty channel path, set it online again by using the tunedasd command with the -p option. See tunedasd - Adjust low-level DASD settings for details.