When you define your I/O configuration, many devices share common hardware components (such as channels, channel cards, switches, control unit ports, control unit adapter cards, and fiber-optic links). For example, all devices for a specific control unit definition share the same hardware components since they share the same channels and control unit ports. Therefore, when a hardware-related error occurs on a channel path, multiple devices are affected.
When an error occurs on a channel path, the system performs path recovery which consists of issuing one or more recovery-related I/Os to test the channel path to see if it is still usable. If path recovery determines that the channel path is no longer usable, the path is removed (varied offline) from the affected device. Otherwise, the channel path remains online to the device.
Path recovery is typically performed one device at a time. This means that when an error occurs on one device, only that device is processed. Errors on other devices are processed independently, even if they share common hardware components. This may affect application performance since the application is delayed while the system performs path recovery and then retries the original I/O request. If the application uses multiple devices that share a failing or malfunctioning hardware component, additional errors are encountered and further delays occur.
Additionally, certain types of path errors can be intermittent. That is, an error occurs, but path recovery is successful, so the path is not removed from the device. This also affects performance because applications may encounter errors multiple times. If this occurs, you may need to manually remove the bad path or paths from the affected devices to stop the errors from occurring.
The PATH_SCOPE option on the RECOVERY statement in the IECIOSxx parmlib member and the SETIOS RECOVERY command, along with the PATH_THRESHOLD and PATH_INTERVAL options, allows you to reduce the elapsed time it takes for the system to recover from channel path-related errors, and helps prevent system performance problems that can occur when a significant amount of time is spent in repetitive channel path error recovery. For more information on the syntax of the RECOVERY statement in IECIOSxx, see z/OS MVS Initialization and Tuning Reference. For more information on the syntax of the SETIOS RECOVERY command, see SETIOS command.
Specify a PATH_SCOPE of either CU or DEVICE to enable path recovery either for all devices attached to the control unit (CU) or on a device-by-device basis (DEVICE). The default is PATH_SCOPE=DEVICE.
If PATH_SCOPE=DEVICE is specified, then path recovery is on a device-by-device basis and no monitoring of intermittent errors is performed. The PATH_INTERVAL and PATH_THRESHOLD keywords may not be specified with PATH_SCOPE=DEVICE.
When PATH_SCOPE=CU is specified and the system internally varies the path offline to all devices on a control unit, the system does not remove the last path to a device if the device is online, allocated, reserved, or in use by a system component. However, if the path becomes not operational because of a link threshold condition, then the last path is taken offline. A link threshold condition, also known as a flapping links condition, occurs when a channel path transitions between not operational and operational multiple times within a short period of time. This is usually a sign of some type of hardware problem. These transitions cause the system to perform path-related recovery, which delays applications until the recovery completes. If the channel path transitions too many times within a short period of time, the channel subsystem keeps the channel path offline to prevent further path recovery.