Troubleshooting for recovery processing

Problems that occur in a data processing system could be failures with communication protocols, data sets, programs, or hardware. These problems are potentially more severe in online systems than in batch systems, because the data is processed in an unpredictable sequence from many different sources.

Online applications therefore require a system with special mechanisms for recovery and restart that batch systems do not require. These mechanisms ensure that each resource associated with an interrupted online application returns to a known state so that processing can restart safely. Together with suitable operating procedures, these mechanisms should provide automatic recovery from failures and allow the system to restart with the minimum of disruption.

The two main recovery requirements of an online system are:

  • To maintain the integrity and consistency of data
  • To minimize the effect of failures

CICS® provides a facility to meet these two requirements called the recovery manager. The CICS recovery manager provides the recovery and restart functions that are needed in an online system.