Recovery in an IMSplex

Recovery in an IMSplex is performed differently for the member IMS systems, Repository Server (RS), and Common Service Layer (CSL).

The following topics provide an overview of each type of recovery:

Recovery of IMS systems in an IMSplex

The recovery procedures for a failed IMS system within an IMSplex are the same as the procedures for recovering a stand-alone IMS system. For example, the procedures for an IMS system that terminates because of a z/OS® failure or a hardware power failure are the same regardless of whether the IMS system is a member of an IMSplex: you must IPL z/OS and then restart IMS using an /ERESTART command.

Recovery of CSL in an IMSplex

If an entire CSL fails, each CSL manager must be restarted. If OM fails and you only have one OM defined for an IMSplex, you lose your type-2 command interface capability and cannot enter commands through a SPOC. You can still enter type-1 commands through supported sources. OM can be restarted by using the z/OS Automatic Restart Manager (ARM) as defined on the OM execution parameter, specifying a started procedure, or submitting JCL.

Similarly, if RM or SCI fails, each must be restarted, either by the z/OS Automatic Restart Manager, specifying a started procedure, or submitting JCL. If no RMs are available, the IMSplex continues functioning, but all RM functions cease until an RM is restarted. If every RM, resource structure, coupling facility, or CQS fails or is inaccessible within an IMSplex, global resource management does not function until they are all restarted. If every SCI fails, the CSL cannot operate because all communication is down. SCI failure also affects automatic RECON loss notification. SCI must be restarted to bring the CSL back in operation.

Recovery of an RS in an IMSplex

The IMSRSC repository is managed by one master RS that manages the repositories defined to it. One or more subordinate RSs can be defined in the sysplex.

The subordinate RSs wait in an initialization state until they identify that the master RS has shut down, at which point, they all attempt to complete the startup process and become the master RS. One subordinate RS becomes the new master RS. The others remain as subordinate RSs.

The master and subordinate RSs must belong to the same z/OS cross-system coupling facility (XCF) group. When you start RSs in different XCF groups, they are independent of one another.

When a subordinate RS attempts to become the master RS the FRP2003I message is issued. When the subordinate RS successfully becomes the master RS, the FRP2002I and FRP2025I messages are issued.

The implementation of subordinate RSs is designed to help accept new connections as quickly as possible if a master RS terminates. However, a subordinate RS does not shadow the master RS. RS clients pass a registration exit to the RS during registration. The registration exit gets driven with the RS unavailable event when the master RS goes down and with the available event when the subordinate RS becomes the master. When the new master is available, the client must then reregister to the new RS and reconnect to any repositories in use.