Example of recovery using data set backup

This example simulates the loss of a volume by varying the volume offline. The failed data sets are recovered onto another volume without first recovering the failed volume.

This example involves two data sets, RLSADSW.VF04D.DATAENDB and RLSADSW.VF04D.TELLCTRL. The data sets are being updated in RLS mode by many CICS® AORs at the time the volume is taken offline. The CICS file names used for these data sets are F04DENDB and F04DCTRL. The failed data sets are recovered onto another volume without first recovering the failed volume. For this purpose, you must know what data sets are on the volume at the time of the failure. Example of recovery using volume backup describes the recovery process by performing a volume restore before the forward recovery of data sets. The procedure for this example is as follows:
  1. Simulate the volume failure using the MVS command:
    ROUTE *ALL,VARY 4186,OFFLINE,FORCE
    The loss of the volume caused I/O errors and transaction abends, producing messages on the MVS system log such as these:
         DFHFC0157 ADSWA04B 030
         TT1P 3326 CICSUSER An I/O error has occurred on base data set
         RLSADSW.VF04D.TELLCTRL accessed via file F04DCTRL component code
         X'00'.
         DFHFC0158 ADSWA04B 031
         96329,13154096,0005EDC00000,D,9S4186,A04B    ,CICS
         ,4186,DA,F04DCTRL,86- OP,UNKNOWN COND.  ,000000A5000403,VSAM
     
         DFHFC0157 ADSWA03C 301
         DE1M 0584 CICSUSER An I/O error has occurred on base data set
         RLSADSW.VF04D.DATAENDB accessed via file F04DENDB component code
         X'00'.
         DFHFC0158 ADSWA03C 031
         …
    As a result of the transaction abends, CICS attempts to back out in-flight UOWs. The backouts failed because CICS cannot access the data sets on the lost volume. The associated backout failures are reported by CICS:
      +DFHFC4701 ADSWA03A 336
       11/24/96 13:15:48 ADSWA03A Backout failed for transaction DE1H, VSAM
       file F04DENDB, unit of work X'ADD18C07DCB70A05', task 46752, base
       RLSADSW.VF04D.DATAENDB, path RLSADSW.VF04D.DATAENDB, failure code
       X'24'.
     
      +DFHFC0152 ADSWA03A 339
       11/24/96 13:15:49 ADSWA03A ???? DE1H An attempt to retain locks for
       data set within unit of work X'ADD18C07DCB70A05' failed.  VSAM return
       code X'00000008' reason code X'000000A9'.
      +DFHME0116 ADSWA03A 340
       (Module: DFHMEME) CICS symptom string for message DFHFC0152 is
       PIDS/565501800 LVLS/510 MS/DFHFC0152 RIDS/DFHFCCA PTFS/UN92873
       REGS/GR15 VALU/00000008 PCSS/IDARETLK PRCS/000000A9
      +DFHFC0312 ADSWA03A Message DFHFC0152 data set RLSADSW.VF04D.DATAENDB
    Use the command CEMT INQUIRE UOWDSNFAIL IOERROR to display the UOWS that were shunted as a result of the I/O errors. For example, on the CICS region ADSWA01D the command showed the following shunted UOWs:
         INQUIRE UOWDSNFAIL IOERROR
         STATUS:  RESULTS
          Dsn(RLSADSW.VF04D.TELLCTRL                      ) Dat Ioe
             Uow(ADD18C2DA4D5FC03)                         Rls
          Dsn(RLSADSW.VF04D.DATAENDB                      ) Dat Ioe
             Uow(ADD18C2E693C7401)                         Rls
  2. Stop the I/O errors by closing the RLS-mode files that were open against failed data sets. In this example, file F04DENDB was open against data set RLSADSW.FV04D.DATAENDB, and file F04DCTRL was open against data set RLSADSW.FV04D.TELLCTRL.

    The usual way of closing RLS-mode files across a sysplex is to quiesce the data set using the command CEMT SET DSNAME QUIESCED in one CICS region. However, the quiesce operation requires access to the data set, and fails if the data set cannot be accessed. The alternative is to issue the SET FILE(F04DENDB) CLOSED and SET FILE(F04DCTRL) CLOSED commands, using CICSPlex® SM to send the command to all the relevant regions. Without CICSPlex SM, issue the CEMT SET FILE CLOSED command to each CICS region individually, either from the MVS console or from a CICS terminal.

  3. To enable CICS VSAM Recovery to recover the failed data sets, delete the catalog entries for the two affected data sets using the IDCAMS DELETE command:
    DELETE RLSADSW.VF04D.TELLCTRL NOSCRATCH
    DELETE RLSADSW.VF04D.DATAENDB NOSCRATCH
  4. The impact of the recovery process is greater if there are inflight tasks updating RLS mode files. For this reason, quiesce the data sets that are being accessed in RLS mode on other volumes before terminating the SMSVSAM servers. To determine which data sets are being accessed in RLS-mode by a CICS region, use the SHCDS LISTSUBSYSDS subcommand. For example, the following command lists those data sets that are being accessed in RLS-mode by CICS region ADSWA01D:
    SHCDS LISTSUBSYSDS('ADSWA01D')
    Note: You can issue SHCDS subcommands as a TSO command or from a batch job.
  5. Stop the SMSVSAM servers using the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
    You receive message IGW572 on each MVS image confirming that the servers are stopping:
    IGW572I REQUEST TO TERMINATE SMSVSAM
            ADDRESS SPACE IS ACCEPTED:
            SMSVSAM SERVER TERMINATION SCHEDULED.
    In this example, stopping the servers causes abends of all in-flight tasks that were updating RLS-mode data sets. This, in turn, causes backout failures and shunted UOWs, which are reported by CICS messages. For example, the effect in CICS region ADSWA03C is shown by the following response to an INQUIRE UOWDSNFAIL command for data set RLSADSW.VF01D.DATASET1:
    INQUIRE UOWDSNFAIL DSN(RLSADSW.VF01D.DATASET1)
    STATUS:  RESULTS
     Dsn(RLSADSW.VF01D.DATASET1                      ) Dat Ope
        Uow(ADD19B8166268E02)                         Rls
     Dsn(RLSADSW.VF01D.DATASET1                      ) Rls Com
        Uow(ADD19B9D93DE1200)                         Rls

    After the SMSVSAM servers stops, all RLS-mode files are automatically closed by CICS and further RLS access prevented.

  6. When you are sure that all servers are down, delete the IGWLOCK00 lock structure with the MVS command:
    VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE

    Follow with the response FORCEDELETELOCKSTRUCTURESMSVSAMYES to allow the lock structure deletion to continue.

    Successful deletion of the lock structure is indicated by the following message:
    IGW527I SMSVSAM FORCE DELETE LOCK STRUCTURE PROCESSING IS NOW COMPLETE
  7. At this point you can restart the SMSVSAM servers with the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE
    Initialization of the SMSVSAM servers results in the creation of a new lock structure, shown by the following message:
    IGW453I SMSVSAM ADDRESS SPACE HAS SUCCESSFULLY
            CONNECTED TO DFSMS LOCK STRUCTURE IGWLOCK00
            STRUCTURE VERSION: ADD1A77F0420E001 SIZE: 35072K bytes
            MAXIMUM USERS: 32 REQUESTED:32
            LOCK TABLE ENTRIES: 2097152 REQUESTED: 2097152
            RECORD TABLE ENTRIES: 129892 USED: 0
    The SMSVSAM server reports that there are no longer any retained locks but that instead there are data sets in the lost locks condition:
    IGW414I SMSVSAM SERVER ADDRESS SPACE IS NOW ACTIVE.
    IGW321I No retained locks
    IGW321I 45 spheres in Lost Locks
    CICS is informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. In this example, CICS issues messages to tell you that lost locks recovery is required on one or more data sets:
    DFHFC0555 ADSWA04A One or more data sets are in lost locks status.
              CICS will perform lost locks recovery.
  8. If you had quiesced data sets before terminating the servers, you would unquiesce those data sets before continuing with the recovery.

    If there were many data sets in lost locks it would take some time for lost locks recovery to complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, it might be necessary to open explicitly files that suffer open failures during lost locks recovery.

    Each data set in a lost locks state is protected from new updates until all CICS regions have completed lost locks recovery for the data set. This means that all shunted UOWs must be resolved before the data set is available for new work. Assuming that all CICS regions are active, and there are no indoubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly.

  9. In this example, use the command CEMT INQUIRE UOWDSNFAIL on CICS region ADSWA01D to show UOW failures only for the RLSADSW.VF04D.TELLCTRL and RLSADSW.VF04D.DATAENDB data sets:
    INQUIRE UOWDSNFAIL
    STATUS:  RESULTS
     Dsn(RLSADSW.VF04D.TELLCTRL                      ) Dat Ope
        Uow(ADD18C2DA4D5FC03)                         Rls
     Dsn(RLSADSW.VF04D.DATAENDB                      ) Dat Ope
        Uow(ADD18C2E693C7401)                         Rls
    The command INQUIRE DSN(RLSADSW.VF04D.DATAENDB) on the same region shows that the lost locks status for the data set was Recoverlocks. This value means that the data set has suffered lost locks and that CICS region ADSWA01D had recovery work to complete:
    INQUIRE DSN(RLSADSW.VF04D.DATAENDB)
    RESULT - OVERTYPE TO MODIFY
      Dsname(RLSADSW.VF04D.DATAENDB)
      Accessmethod(Vsam)
      Action(              )
      Filecount(0001)
      Validity(Valid)
      Object(Base)
      Recovstatus(Fwdrecovable)
      Backuptype()
      Frlog(00)
      Availability( Available )
      Lostlocks(Recoverlocks)
      Retlocks(Retained)
      Quiescestate()
      Uowaction(              )
      Basedsname(RLSADSW.VF04D.DATAENDB)
      Fwdrecovlsn(ADSW.CICSVR.F04DENDB)
  10. At this point, all data sets are available for new work except the two data sets on the failed volume. It is now possible to recover these using CICS VSAM Recovery for z/OS®. For details, see CICS VSAM Recovery for z/OS.
  11. All CICS regions are automatically notified when CICS VSAM Recovery processing for a data set is complete. CICS VSAM Recovery preserves the lost locks state for the recovered data set and CICS disallows all new update requests until all CICS regions have completed lost locks recovery. When all CICS regions have informed SMSVSAM that they have completed their lost locks recovery, the data set lost locks state changes to Nolostlocks.
  12. At this point recovery is complete and the recovered data sets are re-enabled for general access by issuing (through the CICSPlex SM LOCFILES view) the following CEMT commands:
    SET FILE(F04DENDB) ENABLED
    SET FILE(F04DCTRL) ENABLED

    These commands are issued to each CICS AOR that requires access.

  13. All data sets are now available for general access. You can confirm their availability using the SHCDS subcommand LISTSUBSYS(ALL), which shows that no CICS region has lost locks recovery outstanding.

If you follow this example, but find that a CICS region still has a data set in lost locks, you can investigate the UOW failures on that particular CICS region using the CEMT commands INQUIRE UOWDSNFAIL and INQUIRE UOW. For indoubt UOWs that have updated a data set that is in a lost locks condition, CICS waits for indoubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See Lost locks recovery for more information about resolving indoubt UOWs following lost locks processing.