Missing interrupts

At predefined intervals, the operating system checks devices of a specific type to determine if expected I/O interrupts have occurred. If an expected interrupt has not occurred across two of these checks, that interrupt is considered missing. The operating system then issues message IOS071I or IOS076E, writes a logrec data set error record, and tries to correct the problem. For recurring missing interrupts, the operating system issues message IOS075E together with message IOS076E or IOS077E to indicate the recurring condition on a particular device.

A feature of the IBM® 3990-6 and 9340 attached devices allows MVS/ESA to automatically identify a system in a multisystem environment that is holding a reserve. After every start pending MIH condition, the system attempts to determine whether the device is not responding because of a reserve to another system. If the device is reserved to another system, message IOS431I is issued to identify the system by its central processor serial number. If the system holding the reserve is a member of the same sysplex as the system detecting the MIH condition, message IOS431I includes the system name and the LPAR ID, if there is one.

For JES2 systems, when the reserve is held by a system in the same sysplex, the system attempts to obtain information about the job causing the reserve by routing a D GRS,DEV=devnum command to that system. JES2 systems which have JES3 installed must have JES2 started with the NOJES3 option (CON=(xx,NOJES3) in order to identify the job holding the reserve. Message ISG020I identifies the jobs holding the reserve on the failing system. The installation can use this information to determine what to do.

Some causes of missing interrupts are:

The intervals used by the operating system to determine whether an expected interrupt is missing varies from 15 seconds for DASD to 12 minutes for 3330 Disk Storage. An installation can define in the IECIOSxx parmlib member the time intervals for all devices in the I/O configuration. These intervals override the IBM-supplied defaults.

Note:
  1. During IOS recovery processing, the system will override your time interval specification and may issue MIH messages and MIH logrec error records at this IOS determined interval.
  2. During IPL (if the device is defined to be ONLINE) or during the VARY ONLINE process, some devices may present their own MIH timeout values, via the primary/secondary MIH timing enhancement, contained in the self-describing data for the device. The primary MIH timeout value is used fo rmost I/O commands; however, the secondary MIH timeout value may be used for special operations such as long-busy conditions forlong running I/O operations. Any time a user specifically sets a device or device class to have an MIH timeout value that is different from the IBM-supplied default for the device class, the value will override the device-established primary MIH time value. This implies that if an MIH time value that is equal to the MIH default for the defice class is explicitly requested, IOS will not override the device-established primary MIH time value. To override the device-established primary MIH time value, you must explicitly set a time value that is not equal to the MIH default for the device class.

    Note that overriding the device-supplied primary MIH timeout value may adversely affect MIH recovery processing for the device or device class.

    Please refer to the specific device's reference documentation to determine if the device supports self-describing MIH time values.

Note: If there are missing interrupts on the devices that contain the system residence (SYSRES) or the page volumes, the operator may not receive any message, because the needed operating system routines are pageable. The operator can learn about the missing interrupts by initiating restart reason 1.

See z/OS MVS Initialization and Tuning Reference for the IECIOSxx member.