z/OS problem management
Previous topic | Next topic | Contents | Glossary | Contact z/OS | PDF


Overview of a loop

z/OS problem management

A loop is a repetitive set of instructions being performed by a job or unit of work. A job or function that is looping can appear to be hung or can use a high amount of CP resource and lock out other work from getting service.

The three types of loops are:
Disabled loop
A disabled loop is repetitive execution, typically in system level code, with the IO and EXT type interrupts prevented with a PSW mask of X'x4' in the high order byte of the PSW. A disabled loop is bound to one CP in the system. If in a multi-processor environment and resources are held, a spin loop is detected. If on a uniprocessor, a disabled loop will result in a system outage.
Enabled loop
An enabled loop occurs under a unit of work (TCB or SRB) that is executing on behalf of a job or function. It is executing with a PSW that is enabled for I/O and external interrupts with a mask of X'x7' in the the high order byte. A unit of work that is looping enabled, is interrupted periodically for IO, EXT or CLKC type interrupts, which are traced in the system trace table.

Spin loop
A spin loop is a timed disabled loop in system code controlled by the installation with specifications in the EXSPATxx (excessive spin condition actions) parmlib member. The system can spin or loop disabled waiting for a resource, such as a lock, to be released by another CP in a multi-processing environment. See z/OS MVS Initialization and Tuning Reference or more information about the EXSPATxx parmlib member.
Symptoms of a loop:
Disabled loop symptoms
Disabled loops are easier to identify than enabled loops. Symptoms include:
  • System CP usage increases for unexplained reasons.
  • There is no communication with the system through the master and alternate consoles.
  • Console communications are locked out. To check for communication with the console, enter DISPLAY T command and the system will not respond.
Enabled loop symptoms

Enabled loops permit some or all interrupts. The loops are typically caused by an error in an application program. All or most of the loop is in code running in problem state, but the loop can include system code if any instructions in the loop request system services. An enabled loop can run on more than one central processor. The loop will uselessly consume resources and might take over all system operation.

Additional symptoms include:
  • A bottleneck, indicating that the system slows down periodically, thus creating a performance problem.
  • A job stays in the system for a long time without changing status or ending.
  • Low priority work slows down or stops running (a result of a higher priority enabled loop).
  • System CP usage increases for unexplained reasons or CP usage of an address space is much higher than typical.
Spin loop symptoms

A spin loop occurs when one processor in a multiprocessor environment is unable to communicate with another processor or requires a resource currently held by another processor. The processor that has attempted communication is the detecting or spinning processor. The processor that has failed to respond is the failing processor.

The detecting processor continuously attempts its communication with the failing processor until either:
  • It is successful.
  • A specified time interval has passed.

When the communication is not successful within this interval, an excessive spin loop time out exists. The detecting processor then initiates recovery processing for the condition.

MVS processing for excessive spin-loop conditions can provide recovery without any operator prompts or actions required. These recovery actions can use the default setting or specified in the EXSPATxx parmlib member:
SPIN
Continue spinning for another interval to permit the event to complete
ABEND
End the current unit of work on the failing processor but permit the recovery routines to retry
TERM
End the current unit of work on the failing processor and do not permit the recovery routines to retry
ACR
Invoke alternate CP recovery (ACR) to take the failing processor offline.
  • The system chooses the appropriate action without requiring any decision or action. If an action taken in response to an occurrence of an excessive spin loop does not resolve the condition, the system takes the next action when the next excessive spin loop time out occurs. The default order in which the system takes the actions is SPIN, ABEND, TERM, and ACR.
  • An installation can change the order of the actions, except the first one, that the system takes.
  • For hardware-related errors that formerly caused message IEA490A, the system immediately initiates ACR processing without working through the sequence of actions and without requiring any intervention.
  • There is a default spin loop time-out interval. You can change this interval through the combination of a parameter in EXSPATxx parmlib member and entering the SET command.
  • To avoid unnecessary recovery actions, system functions that can validly exceed the interval are exempt from excessive spin-loop processing, so that they will not cause any recovery actions. If they exceed the time out interval, these system functions do cause an excessive spin loop record to be written to the logrec data set.
  • The installation can still control excessive spin loop recovery through operator actions.
See EXSPATxx (excessive spin condition actions) in z/OS MVS Initialization and Tuning Reference.




Copyright IBM Corporation 1990, 2010