Introduction to resource recovery

Most customers maintain computer resources that are essential to the survival of their businesses. When these resources are updated in a controlled and synchronized manner, they are said to be protected resources or recoverable resources. These resources can all reside locally (on the same system) or be distributed (across nodes in the network). The protocols and mechanisms for regulating the updating of multiple protected resources in a consistent manner is provided in z/OS® with z/OS Resource Recovery Services (RRS).

Participants in resource recovery

As shown in the following figure, the Resource Recovery environment is composed of three participants:

  • Sync-point manager
  • Resource managers
  • Application program

RRS is the sync-point manager, also known as the coordinator. The sync-point manager controls the commitment of protected resources by coordinating the commit request (or backout request) with the resource managers, the participating owners of the updated resources. These resource managers are known as participants in the sync-point process. IMS participates as a resource manager for DL/I, Fast Path, and Db2 for z/OS data if this data has been updated in such an environment.

The final participant in this resource recovery protocol is the application program, the program accessing and updating protected resources. The application program decides whether the data is to be committed or aborted and relates this decision to the sync-point manager. The sync-point manager then coordinates the actions in support of this decision among the resource managers.
Figure 1. Participants in resource recovery
Begin figure description. RRS Sync-Point Manager interacts with resource managers and applications programs. Application programs interact with resource managers and the RRS Sync-Point Manager. End figure description.

Two-phase commit protocol

As shown in the following figure, the two-phase commit protocol is a process involving the sync-point manager and the resource manager participants to ensure that of the updates made to a set of resources by a third participant, the application program, either all updates occur or none. In simple terms, the application program decides to commit its changes to some resources; this commit is made to the sync-point manager that then polls all of the resource managers as to the feasibility of the commit call. This is the prepare phase, often called phase one. Each resource manager votes yes or no to the commit.

After the sync-point manager has gathered all the votes, phase two begins. If all votes are to commit the changes, then the phase two action is commit. Otherwise, phase two becomes a backout. System failures, communication failures, resource manager failures, or application failures are not barriers to the completion of the two-phase commit process.

The work done by various resource managers is called a unit of recovery (UOR) and spans the time from one consistent point of the work to another consistent point, usually from one commit point to another. It is the unit of recovery that is the object of the two-phase commit process.

Figure 2. Two-phase commit process with one resource manager
begin figure description. This figure is described in the surrounding text. end figure description.

Notes:

  1. The application and IMS make a connection.
  2. IMS expresses protected interest in the work started by the application. This tells RRS that IMS will participate in the 2-phase commit process.
  3. The application makes a read request to an IMS resource.
  4. Control is returned to the application following its read request.
  5. The application updates a protected resource.
  6. Control is returned to the application following its update request.
  7. The application requests that the update be made permanent by way of the SRRCMIT call.
  8. RRS calls IMS to do the prepare (phase 1) process.
  9. IMS returns to RRS with its vote to commit.
  10. RRS calls IMS to do the commit (phase 2) process.
  11. IMS informs RRS that it has completed phase 2.
  12. Control is returned to the application following its commit request.

Local versus distributed

The residence of the participants involved in the recovery process determines whether that recovery is considered local or distributed. In a local recovery scenario, all the participants reside on the same single system. In a distributed recovery scenario, the participants are scattered over multiple systems. The following figure shows the communication between Resource Manager participants in a distributed resource recovery. There is no conceptual difference between a local and distributed recovery in the functions provided by RRS. However, to distribute the original sync-point manager's function to involve remote sync-point managers, a special resource manager is required. The APPC communications resource manager provides this support in the distributed environment.

Figure 3. Distributed resource recovery
Recovery participants on different operating systems communicate with one another through communications resource managers on each operating system. End figure description.