XRF concepts and terminology
High availability with only short interruptions in end-user service can be difficult to achieve for IMS failures. Before restart can take place, termination processing might need to be performed and the cause of failure determined. Restart itself can be time consuming if regions must be restarted and terminal sessions reestablished.
An XRF complex allows you to rapidly resume end-user service when a failure occurs. The configuration consists of a primary subsystem (called the active IMS), usually operating in a preferred processor, and an alternate subsystem (called the alternate IMS), tracking the activities of the active IMS concurrently in the same or in a different processor.
The active IMS processes the IMS workload in the same way that it would without XRF. It does not do any more work than its normal processing and logging. Through the log, it informs the alternate IMS of its activity. A combination of surveillance mechanisms using the log, the restart data set (RDS), and an ISC link, alert the alternate IMS to problems in the active IMS. The active IMS is not aware of the activities of the alternate IMS.
The alternate IMS does not process any of the IMS transactions that are entered in the active IMS; it monitors the active IMS for an indication of trouble. The alternate IMS records resource changes in the active IMS and updates its control blocks and buffers, continuously changing its status to match that of the active IMS. This alternate IMS is in a state of readiness so that work can quickly shift from the active IMS to the alternate IMS without waiting for dependent regions to be started, data sets to be allocated and opened, and sessions to be established. This shift of the workload from the active IMS to the alternate IMS is called a takeover. After a takeover has occurred, you can establish a new alternate IMS.
When an IMS failure occurs, or when failures that affect the z/OS® operating system or entire central processor complex (CPC) occur, the alternate IMS assumes the workload of the active IMS. If end users are using appropriate terminals, they experience little disruption when a takeover occurs. The effect is of a single-system image; the end user does not need to know which IMS is being used and might not even know that a switch in the IMS host system occurred.
A takeover can occur when the active IMS abends. It can also occur for some of the other common causes of outages at an IMS installation, such as:
- Surveillance-detectable IMS failures
- Surveillance-detectable z/OS failures, loops, or wait states
- CPC failures
- VTAM® failures that results in a TPEND exit
- Internal Resource Lock Manager (IRLM) failures that results in a STATUS exit
You can also invoke XRF to introduce planned changes into a production environment with minimal disruption to end users.
One of the major service elements is the central processor complex (CPC). The CPC is a physical collection of hardware that consists of main storage, one or more central processors, timers, and channels. A CPC runs under the control of a single operating system. It can be either a uniprocessor or a multiprocessor (including a dyadic processor).
When discussing a 3705, 3725, or 3745 Communication Controller, which is required when your XRF complex uses USERVAR, this information uses the term 37x5 Communication Controller.
- Use the Communication Controller for Linux® on System z® (CCL) as a replacement for the 3745 controller to allowed continued use of XRF
- Migrate from XRF to VTAM Generic Resources (VGR), which requires a parallel sysplex environment
- Use IMS Multi-Node Persistent Sessions (MNPS) support for XRF. Due to performance limitations, this is not a recommended solution.
The following terms are specific to XRF:
- recoverable service element (RSE)
- The active IMS system and
the alternate IMS system that
work as a unit in an XRF complex.
The two IMS systems in an RSE share an RSENAME and either an MNPS ACB name or USERVAR tables, depending on which method you choose to manage terminal sessions for XRF. You can start either IMS system in an RSE as the active IMS system (subject to operational restrictions), and then start the other as the alternate IMS system. Both identify themselves to the availability manager (AVM) component of z/OS .
- dependent service element (DSE)
- An element of the active IMS system
that has a counterpart in the alternate IMS system
but cannot trigger a takeover on its own.
A DSE depends on IMS to recognize when it fails and to request a takeover on its behalf. The CPC, z/OS, VTAM, and IRLM are DSEs.
- takeover
- A shift of workload from the active IMS system
to the alternate IMS system. Two kinds of takeovers are:
- planned takeover
- An intentional shift of the IMS workload to the alternate IMS system to allow maintenance of the active IMS system.
- unplanned takeover
- A shift of the IMS workload to the alternate IMS due to a failure of the active IMS system or the CPC, z/OS operating system, or VTAM associated with the active IMS system.
- DASD service element
- The DASD associated with an XRF complex.
The DASD service element is neither recoverable nor dependent; it is shared by both the active IMS system and the alternate IMS system. A major failure in the DASD service element might terminate service to end users. This is also true of the 37x5 Communication Controller, if your XRF complex uses USERVAR.