Recovery during the IMSRSC repository data set update process

The Repository Server (RS) uses a duplex pair of data sets for each IMSRSC repository, so that the RS can always recover data sets to the last completed and verified write activity.

The duplex pair of data sets is the primary repository index data set (RID) and repository member data set (RMD) (COPY1) and the secondary RID and RMD (COPY2). The RS always requires both the primary and secondary data set pairs to be available; the repository is otherwise stopped.

The repository can have an optional third defined data set pair, the spare RID and RMD (SPARE), which is used during the SPARE recovery process.

The RS writes to the repository in two phases:
  1. In the first phase, the COPY1 RID and RMD are updated. At the crossover point, the change is considered completed. A successful return code is sent to the requesting client.
  2. In the second phase, the same updates are written to the COPY2 RID and RMD. The request is completed.
The following sequence shows how the RS performs recovery of data sets if an I/O error occurs during the first phase of the update process (write to the COPY1 data set):
  1. The request in progress fails with an error.
  2. The repository becomes unavailable. The client is notified through any client connection exits with the Repository unavailable status.
  3. If the RS verifies that a spare data set pair is unavailable, the repository is stopped. The client is notified that the repository is stopped. Administrator intervention is required to delete the data set in error and define a new data set and restart the repository. During repository start processing, data is copied from the valid data set to the newly added data set.
  4. If the RS verifies that a valid spare data set pair is available and the other conditions for recovery are met, then spare recovery processing is started:
    1. Any client connection exits are driven with the Repository recovery started status.
    2. When recovery processing is completed, data is copied from the COPY2 data set to the SPARE data set pair and the status of the SPARE data set pair is changed to COPY1. The repository becomes available, and any client connection exits are driven with the Repository recovery ended successfully status.
    3. If an error occurs during recovery processing, any client connection exits are driven with the Repository recovery error status.

If the RS fails during the second phase of the update process (write to the COPY2 data set), the RS performs the same tasks, except that data is copied from the COPY1 data set to the SPARE data set pair and the status of the SPARE data set pair is changed to COPY2. As the first phase of the update process is completed, the request in progress is considered completed. A successful return code is returned to the caller.

If the automated SPARE recovery process fails, the repository might be in a CLOSE state.

If a repository remains in CLOSED state at the end of SPARE recovery processing, perform the following steps to make the repository available:
  1. Re-allocate the discarded repository data sets.
  2. Issue the LIST FRPBATCH command to check if the repository is in a STOP state. If it is not, issue the STOP FRPBATCH command to change the state to STOP.
  3. Issue the DSCHANGE FRPBATCH command to change the re-allocated discarded data sets back to a SPARE state.
  4. Issue the START FRPBATCH command to start the repository.

    This resumes the data set recovery process.

For automated recovery to take place, either the primary RID and RMD or the secondary RID and RMD must be valid and a SPARE repository data set pair must be available.

If a SPARE repository data set pair is not defined when the failure occurs, the repository is stopped. You must define new data sets for the primary or secondary data sets and start the repository. Perform the following procedure:
  1. Allocate new repository data sets to take the place of the failed primary or secondary data sets and to create a SPARE repository data set pair.
  2. Issue the DSCHANGE FRPBATCH command to discard the repository data sets in error.
  3. Issue the UPDATE FRPBATCH command to replace the failed data sets with the newly allocated data sets for the existing repository.
  4. Issue the START FRPBATCH command to start the repository.

    This resumes the data set recovery process.

You can also use the F reposervername,ADMIN command equivalents to the FRPBATCH commands.

The following topics describe the recovery procedures after the failure of both the primary and secondary RIDs and RMDs.