Emergency restart
IMS requires an emergency restart whenever it terminates without a controlled shutdown. When restarted, IMS restarts from the point of failure. You can initiate an emergency restart using the /ERESTART command.
With an emergency restart, IMS:
- Backs out DL/I in-flight units of work
- Applies committed but unwritten DEDB changes to the database
- Retains any in-doubt units of work for IMS subsystems connected to a CCTL or to Recoverable Resource Management Services (RRMS), the z/OS® syncpoint manager.
- Resolves any unfinished External Subsystem Attach Facility (ESAF) units of work with external subsystems
- Releases locked messages in a shared-queues environment
- Uses checkpoint and X'22' log records to recover MODLBKS resources
Before starting IMS and entering the emergency restart command, you need to know the following:
- Whether system data sets must be reallocated and formatted during restart.
- Whether IMS has cleaned up
its resources. You can check for message
DFS627IorDFS627Wto determine the previous ending status of the IMS resource termination manager. These messages tell you if IMS successfully completed resource cleanup processing. Successful resource cleanup guarantees IMS restart without requiring a z/OS restart.
After an IMS failure, you can use the ESAF In-Doubt Notification exit routine (DFSFIDN0) to determine in-doubt units of work during restart of the failed IMS. You can resolve the work outside of IMS.
During an emergency restart, IMS restores the system to the status it had at the last system checkpoint. Databases and areas reflect their content at the last sync point for each dependent region. The local message queues reflect their content at the time of the failure. And the system has just taken a new checkpoint. When you restart the dependent regions, IMS reschedules the previously processing programs. IMS also releases any transactions on the suspend queue.
If the backout of any database fails during an emergency restart, IMS identifies the database by issuing DFS981I and the database remains stopped after restart completes. After IMS is running again, you might able to complete the backout by entering a /START DB or UPDATE DB START(ACCESS) command; however, depending on the root cause of the backout failure, you might need to take additional steps to correct the problem.
After an IMS failure when XRF is used with VTAM® MNPS, VTAM continues to maintain persistent sessions until the timer specified by the PSTIMER= keyword (specified in DFSDCxxx IMS.PROCLIB member) expires. If there is no XRF takeover, then VTAM might continue to maintain active persistent sessions when the MNPS ACB is opened after emergency restart. If this occurs, IMS issues the command SETLOGON OPTCD=NPERSIST, which tells VTAM to drop all persistent sessions being maintained. This is consistent with emergency restart processing when the XRF USERVAR= keyword is used, which also has no session recovery. IMS then issues the command SETLOGON OPTCD=PERSIST to start the normal session persistence for all new sessions on the MNPS ACB.
If an emergency restart fails, you do not have to cold start the entire IMS subsystem. The following table shows the various emergency restart cold start commands and how they affect an IMS system.
| Emergency restart command | Effect on the IMS DB or DBCTL subsystem | Effect on the IMS TM or DCCTL subsystem |
|---|---|---|
| /ERESTART COLDBASE | Cold restart | Emergency restart |
| /ERESTART COLDCOMM | Emergency restart | Cold restart |
| /ERESTART COLDSYS | Cold restart | Cold restart |
- The /ERE COLDBASE command performs a cold start
of the DB portion of an IMS DB/DC
subsystem.
If you use this command, you are responsible for the recovery of the databases. IMS does not redo committed DEDB updates, and does not back out in-flight updates for DL/I databases. IMS identifies and stops databases that have in-doubt data or that need backout or recovery. You can backout in-flight DL/I data by running the Database Batch Backout utility, for which you should close (and, optionally, archive) the OLDS.
- The /ERE COLDCOMM command performs a cold start
of the TM portion of an IMS DB/DC
subsystem.
This command initializes the message queues, recovers DEDBs, reloads MSDBs, and backs out in-flight changes to DL/I databases. At the same time, IMS maintains all existing in-doubt data.
- The /ERE COLDSYS command allows you to cold
start both DB and TM.
This command is essentially a combination of both /ERE COLDBASE and /ERE COLDCOMM, but it does not read the OLDS. Therefore, closing the OLDS is essential. The processing described for both the/ERE COLDBASE and /ERE COLDCOMM commands also pertains to /ERE COLDSYS.
Of the three forms of emergency restart command, you will likely use/ERE COLDCOMM most often.
If an emergency restart (/ERE) fails and a subsequent emergency restart (/ERE COLDBASE or /ERE COLDCOMM) also fails, you must issue the /ERE COLDSYS command to restart IMS. You must also close (and optionally, archive) the last OLDS that was used online before performing this /ERE COLDSYS restart because the log must be available in case database recovery is necessary.
If an IMS database encounters an I/O error and the Extended Specified Task Abnormal Exit (ESTAE) routine is not able to execute, then IMS does not create an EEQE and does not lock the error block. Consequently, an IMS emergency restart does not detect the bad block. In this situation, you might need to use the OVERRIDE keyword on the /ERE command or set the abnormal termination flag in the RECON data set for the failing subsystem.