A system hang or wait can occur gradually as a resource contention problem or abruptly when a disabled wait state is loaded for a critical software-detected error.
You might notice the following symptoms:
- A disabled coded wait state is loaded
- A hang during IPL or system initialization
- The consoles can be locked
- There can be contention for system resources
- The system code can be looping.
When there is a system failure or outage, a stand-alone dump must be taken for problem diagnosis. OPERLOG, SYSLOG, and EREP reports from the time frame of the system outage are also important.
This section will only discuss system hangs and waits. When a job or subsystem is hung, see Diagnosing a job or subsystem hang.
Symptoms of a wait or hang: The system enters a wait or the entire system hangs. The terms hang and wait are used synonymously in these procedures. Some symptoms of a hang:
- No response occurs on the user's or system operator's console.
- No communication with the system through the console occur.
- No response from subsystems (TSO/E, CICS, IMS, DB2, and others) occur.
- The system does not issue or receive messages on the console.
- A series of messages that indicate waits followed by bursts of activity.
- A message indicating a wait appears on the system console.
- The program status word (PSW) contains X'070E0000 00000000'.
- The job entry subsystem does not respond to any commands. For example, in a JES2 system, enter a $DI1 command and JES2 does not respond.
There are two types of wait states: enabled and disabled.
- Enabled wait
- The system stops processing without issuing a wait state code when the dispatcher did not find any work to be dispatched.
A special type of enabled wait is called a no work wait or a dummy wait. An indication of a dummy wait or no work wait is a PSW of X'070E0000 00000000' and GPRs containing all zeros. Diagnosis is required for this type of wait only when the system does not resume processing.
The most common causes of an enabled wait are that the system is waiting for:
- Work – the system has no active jobs to process or all active jobs are swapped out.
- Action – an operator reply or other action.
- Missing interrupts – the system is waiting for a critical device, which is busy, not ready, reserved by another system, or has a mount pending. If the system residence (SYSRES) or paging (PAGE) volumes have missing interrupts, the operator might not get a message.
- System resource – work is waiting for a resource, which can be a lock, queue, input/output (I/O) device, page, or device allocation.
- Disabled wait with a wait state code
- The system issues a wait state code and stops. The operator can see the wait state code on the system console. This wait is called a coded wait state or a disabled wait. There are two types of disabled wait state codes:
- restartable wait state
- You can restart the system.
A restartable wait is one of the following:
- An attempt by the operating system to communicate with the operator. When the system cannot send a message to a console, the system can use a restartable wait state to contact the operator and obtain a response.
- A way to preempt processing. For a SLIP trap with an action of wait, the system will issue a message, then enter a restartable wait.
- A symptom of another problem.
- non-restartable
- You cannot restart the system. After capturing a stand-alone dump, you must reIPL the system.