Discriminating between system and application problems

The first step in determining what actions to take is to discriminate between a problem with the global resource serialization complex and the applications it serves. There are several kinds of problems that can occur which will affect global resource serialization processing:
  • Tuning:

    A poorly tuned system can elongate global resource serialization requests (ENQ and DEQ). An example of a poorly tuned system is a ring complex where some of the systems have too high a RESMIL value. Another example is a star complex where the lock structure is too small causing excessive false contention in the lock structure. Tuning the complex will alleviate these problems.

  • Intersystem communication breakdown:

    Global resource serialization relies on intersystem communication, through XCF communication facilities, which can be either CTCs or coupling facility signalling structures. Communication failures or delays might cause global resource serialization to take recovery actions which can delay and/or elongate ENQ and DEQ request processing.

  • Coupling facility availability:

    The loss of a coupling facility might cause problems with a global resource serialization ring. In star mode, if the ISGLOCK structure fails or the containing coupling facility is lost, the systems in the sysplex cooperate to rebuild the structure.

  • Software:

    Global resource serialization occasionally runs into situations that are not understood by the software or that are not automatically corrected. If a problem is detected that could cause a resource allocation integrity error (for example, more than one exclusive owner of a resource), global resource serialization will take appropriate actions to ensure that such an error does not occur. These actions include fencing a set of resources from being allocated or partitioning a system from the complex.

  • Resource allocation:

    Even if your global resource serialization complex is well tuned, a combination of applications, system utilities, and online users can impede workload progress due to the use of resources. For example, a long running job or utility can hold data sets exclusively, effectively blocking other jobs and users from proceeding. In more extreme cases, it is possible that a set of requests can cause a deadlock for resource requests by causing a situation where a set of users requires resources held by other users. This situation can only be remedied by breaking the deadlock, usually by cancelling one or more of the jobs in the deadlock.