Example of recovery using data set backup
This example simulates the loss of a volume by varying the volume offline. The failed data sets are recovered onto another volume without first recovering the failed volume.
- Simulate the volume failure using the MVS command:
ROUTE *ALL,VARY 4186,OFFLINE,FORCEThe loss of the volume caused I/O errors and transaction abends, producing messages on the MVS system log such as these:DFHFC0157 ADSWA04B 030 TT1P 3326 CICSUSER An I/O error has occurred on base data set RLSADSW.VF04D.TELLCTRL accessed via file F04DCTRL component code X'00'. DFHFC0158 ADSWA04B 031 96329,13154096,0005EDC00000,D,9S4186,A04B ,CICS ,4186,DA,F04DCTRL,86- OP,UNKNOWN COND. ,000000A5000403,VSAM DFHFC0157 ADSWA03C 301 DE1M 0584 CICSUSER An I/O error has occurred on base data set RLSADSW.VF04D.DATAENDB accessed via file F04DENDB component code X'00'. DFHFC0158 ADSWA03C 031 …As a result of the transaction abends, CICS attempts to back out in-flight UOWs. The backouts failed because CICS cannot access the data sets on the lost volume. The associated backout failures are reported by CICS:+DFHFC4701 ADSWA03A 336 11/24/96 13:15:48 ADSWA03A Backout failed for transaction DE1H, VSAM file F04DENDB, unit of work X'ADD18C07DCB70A05', task 46752, base RLSADSW.VF04D.DATAENDB, path RLSADSW.VF04D.DATAENDB, failure code X'24'. +DFHFC0152 ADSWA03A 339 11/24/96 13:15:49 ADSWA03A ???? DE1H An attempt to retain locks for data set within unit of work X'ADD18C07DCB70A05' failed. VSAM return code X'00000008' reason code X'000000A9'. +DFHME0116 ADSWA03A 340 (Module: DFHMEME) CICS symptom string for message DFHFC0152 is PIDS/565501800 LVLS/510 MS/DFHFC0152 RIDS/DFHFCCA PTFS/UN92873 REGS/GR15 VALU/00000008 PCSS/IDARETLK PRCS/000000A9 +DFHFC0312 ADSWA03A Message DFHFC0152 data set RLSADSW.VF04D.DATAENDBUse the command CEMT INQUIRE UOWDSNFAIL IOERROR to display the UOWS that were shunted as a result of the I/O errors. For example, on the CICS region ADSWA01D the command showed the following shunted UOWs:INQUIRE UOWDSNFAIL IOERROR STATUS: RESULTS Dsn(RLSADSW.VF04D.TELLCTRL ) Dat Ioe Uow(ADD18C2DA4D5FC03) Rls Dsn(RLSADSW.VF04D.DATAENDB ) Dat Ioe Uow(ADD18C2E693C7401) Rls - Stop the I/O errors by closing the RLS-mode files that were open
against failed data sets. In this example, file F04DENDB was open
against data set RLSADSW.FV04D.DATAENDB, and file F04DCTRL was open
against data set RLSADSW.FV04D.TELLCTRL.
The usual way of closing RLS-mode files across a sysplex is to quiesce the data set using the command CEMT SET DSNAME QUIESCED in one CICS region. However, the quiesce operation requires access to the data set, and fails if the data set cannot be accessed. The alternative is to issue the SET FILE(F04DENDB) CLOSED and SET FILE(F04DCTRL) CLOSED commands, using CICSPlex® SM to send the command to all the relevant regions. Without CICSPlex SM, issue the CEMT SET FILE CLOSED command to each CICS region individually, either from the MVS console or from a CICS terminal.
- To enable CICS VSAM Recovery to
recover the failed data sets, delete the catalog entries for the two
affected data sets using the IDCAMS DELETE command:
DELETE RLSADSW.VF04D.TELLCTRL NOSCRATCH DELETE RLSADSW.VF04D.DATAENDB NOSCRATCH - The impact of the recovery process is greater if there are inflight
tasks updating RLS mode files. For this reason, quiesce the data sets
that are being accessed in RLS mode on other volumes before terminating
the SMSVSAM servers. To determine which data sets are being accessed
in RLS-mode by a CICS region, use the SHCDS LISTSUBSYSDS subcommand.
For example, the following command lists those data sets that are
being accessed in RLS-mode by CICS region ADSWA01D:
SHCDS LISTSUBSYSDS('ADSWA01D')Note: You can issue SHCDS subcommands as a TSO command or from a batch job. - Stop the SMSVSAM servers using the MVS command:
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVERYou receive message IGW572 on each MVS image confirming that the servers are stopping:IGW572I REQUEST TO TERMINATE SMSVSAM ADDRESS SPACE IS ACCEPTED: SMSVSAM SERVER TERMINATION SCHEDULED.In this example, stopping the servers causes abends of all in-flight tasks that were updating RLS-mode data sets. This, in turn, causes backout failures and shunted UOWs, which are reported by CICS messages. For example, the effect in CICS region ADSWA03C is shown by the following response to an INQUIRE UOWDSNFAIL command for data set RLSADSW.VF01D.DATASET1:INQUIRE UOWDSNFAIL DSN(RLSADSW.VF01D.DATASET1) STATUS: RESULTS Dsn(RLSADSW.VF01D.DATASET1 ) Dat Ope Uow(ADD19B8166268E02) Rls Dsn(RLSADSW.VF01D.DATASET1 ) Rls Com Uow(ADD19B9D93DE1200) RlsAfter the SMSVSAM servers stops, all RLS-mode files are automatically closed by CICS and further RLS access prevented.
- When you are sure that all servers are down, delete the IGWLOCK00
lock structure with the MVS command:
VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTUREFollow with the response FORCEDELETELOCKSTRUCTURESMSVSAMYES to allow the lock structure deletion to continue.
Successful deletion of the lock structure is indicated by the following message:IGW527I SMSVSAM FORCE DELETE LOCK STRUCTURE PROCESSING IS NOW COMPLETE - At this point you can restart the SMSVSAM servers with the MVS command:
ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVEInitialization of the SMSVSAM servers results in the creation of a new lock structure, shown by the following message:IGW453I SMSVSAM ADDRESS SPACE HAS SUCCESSFULLY CONNECTED TO DFSMS LOCK STRUCTURE IGWLOCK00 STRUCTURE VERSION: ADD1A77F0420E001 SIZE: 35072K bytes MAXIMUM USERS: 32 REQUESTED:32 LOCK TABLE ENTRIES: 2097152 REQUESTED: 2097152 RECORD TABLE ENTRIES: 129892 USED: 0The SMSVSAM server reports that there are no longer any retained locks but that instead there are data sets in thelost locks
condition:IGW414I SMSVSAM SERVER ADDRESS SPACE IS NOW ACTIVE. IGW321I No retained locks IGW321I 45 spheres in Lost LocksCICS is informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. In this example, CICS issues messages to tell you that lost locks recovery is required on one or more data sets:DFHFC0555 ADSWA04A One or more data sets are in lost locks status. CICS will perform lost locks recovery. - If you had quiesced data sets before terminating the servers,
you would unquiesce those data sets before continuing with the recovery.
If there were many data sets in lost locks it would take some time for lost locks recovery to complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, it might be necessary to open explicitly files that suffer open failures during lost locks recovery.
Each data set in a lost locks state is protected from new updates until all CICS regions have completed lost locks recovery for the data set. This means that all shunted UOWs must be resolved before the data set is available for new work. Assuming that all CICS regions are active, and there are no indoubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly.
- In this example, use the command CEMT INQUIRE UOWDSNFAIL on CICS region
ADSWA01D to show UOW failures only for the RLSADSW.VF04D.TELLCTRL
and RLSADSW.VF04D.DATAENDB data sets:
INQUIRE UOWDSNFAIL STATUS: RESULTS Dsn(RLSADSW.VF04D.TELLCTRL ) Dat Ope Uow(ADD18C2DA4D5FC03) Rls Dsn(RLSADSW.VF04D.DATAENDB ) Dat Ope Uow(ADD18C2E693C7401) RlsThe command INQUIRE DSN(RLSADSW.VF04D.DATAENDB) on the same region shows that the lost locks status for the data set was Recoverlocks. This value means that the data set has suffered lost locks and that CICS region ADSWA01D had recovery work to complete:INQUIRE DSN(RLSADSW.VF04D.DATAENDB) RESULT - OVERTYPE TO MODIFY Dsname(RLSADSW.VF04D.DATAENDB) Accessmethod(Vsam) Action( ) Filecount(0001) Validity(Valid) Object(Base) Recovstatus(Fwdrecovable) Backuptype() Frlog(00) Availability( Available ) Lostlocks(Recoverlocks) Retlocks(Retained) Quiescestate() Uowaction( ) Basedsname(RLSADSW.VF04D.DATAENDB) Fwdrecovlsn(ADSW.CICSVR.F04DENDB) - At this point, all data sets are available for new work except the two data sets on the failed volume. It is now possible to recover these using CICS VSAM Recovery for z/OS®. For details, see CICS VSAM Recovery for z/OS.
- All CICS regions are automatically notified when CICS VSAM Recovery processing for a data set is complete. CICS VSAM Recovery preserves the lost locks state for the recovered data set and CICS disallows all new update requests until all CICS regions have completed lost locks recovery. When all CICS regions have informed SMSVSAM that they have completed their lost locks recovery, the data set lost locks state changes to Nolostlocks.
- At this point recovery is complete and the recovered data sets
are re-enabled for general access by issuing (through the CICSPlex SM
LOCFILES view) the following CEMT commands:
SET FILE(F04DENDB) ENABLED SET FILE(F04DCTRL) ENABLEDThese commands are issued to each CICS AOR that requires access.
- All data sets are now available for general access. You can confirm their availability using the SHCDS subcommand LISTSUBSYS(ALL), which shows that no CICS region has lost locks recovery outstanding.
If you follow this example, but find that a CICS region still has a data set in lost locks, you can investigate the UOW failures on that particular CICS region using the CEMT commands INQUIRE UOWDSNFAIL and INQUIRE UOW. For indoubt UOWs that have updated a data set that is in a lost locks condition, CICS waits for indoubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See Lost locks recovery for more information about resolving indoubt UOWs following lost locks processing.