Deciding whether to provide recovery

MVS™ does all that it can to ensure the availability of programs, and to protect the integrity of system resources. However, MVS cannot provide effective recovery for every individual application program, so programs need recovery routines of their own.

To decide whether you need to provide recovery for a particular program, and the amount of recovery to provide, you should:

Determine what the consequences will be if the program encounters an error and ends.
Compare the cost of tolerating those consequences to the cost of providing recovery.

In general, if you have a large, complex program upon which a great number of users depend, such as a subsystem, a database manager, or any application that provides an important service to many other programs or end users, you will almost certainly want to provide recovery. For small, simple programs upon which very few users depend, you might not get enough return on your investment. Between these two extremes is a whole spectrum of possibilities.

Consider the following points in making your decision. Providing recovery:

Increases your program's availability.
Depending on the nature of the error, your recovery routine might successfully correct the error and allow your program to continue processing normally. Maintaining maximum availability is one of the major objectives of providing recovery.
Is a way to protect both system and application resources.
In general, recovery routines should clean up any resources your program is holding that might be requested by another program, or another user of your program. The purpose of clean up is to:
- Allow your program to run again successfully without requiring a re-IPL
- Allow the system to continue to run other work (consider especially other work related to the failing program).
System locks, ENQs, and latches are examples of important resources shared by other programs. A program should provide for the release of these resources if an error occurs so that other programs can access them. Releasing resources is especially important if your program is a service routine. A service routine must release resources before returning to its caller, so the caller does not end up holding resources that it did not request.

Note: Locks, ENQs, and latches are all used for serialization. See Serialization for more information about serialization.

Another resource a program should release is any virtual storage it obtained, so that the storage becomes available to other programs. Note that the most important storage to release is common storage.

Recovery routines should also ensure the integrity of any data being accessed. Consider the case of a database application that is responsible for protecting its database resources. The application must ensure the integrity and consistency of the data in the event an error occurs. Data changes that were made prior to the error might have to be backed out from the database.
Provides for communication between different processes.
An example of this would be a task that sends a request to another task. If the second task encounters an error, a recovery routine could inform the first task that its request will not be fulfilled.

When dealing with a multi-tasking environment, you must plan your recovery in terms of the multiple tasks involved. You must have a cohesive scheme that provides recovery for the set of tasks rather than thinking only in terms of a single task.
Is a way to help you determine what went wrong when an error occurs in your program.
Recovery routines can do such things as save serviceability data, request recording of an error in the logrec data set, and request dumps. Each of these actions help you determine what went wrong in your program, and each is explained in greater detail later in this information. Note that the recovery routine must provide whatever serviceability data it wants the system to record.
Facilitates validity checking of user parameters.
Consider the case of a program that must verify input from its callers. The program does parameter validation, but might not catch all variations. For example, the caller might pass the address of an input data area that appears to be valid; however, the caller did not have access to that storage. When the program attempts to update the data area, a protection exception occurs. A recovery routine could intercept this error, and allow the program to pass back a return code to the caller indicating the input was not valid.

Providing recovery in a case like this improves the reliability of your program.

If you do not provide recovery for your program, and your program encounters an error, MVS handles the problem to some extent, but the result is that your program ends before you expected it to, and application resources might not be cleaned up.