Case 1: Damaged CDS, full journal

It's a cold Monday morning when Rick, a storage administrator, arrives at work to find the following note on his desk:

Figure 1. Handwritten note from Steve to Rick. The note reads: "DFSMShsm stopped with an ARC0026E journaling disabled message. I tried to reach you by phone, but wasn't successful. To get DFSMShsm running, I did what the Messages manual said, and deleted and reallocated the journal. Have a good day! Signed Steve.
Handwritten note from Steve to Rick.

What does Rick do after he regains consciousness?

The first thing Rick does is review the DFSMShsm command activity log to see what caused the ARC0026E journaling error. He sees that the ARC0026E error specifies an EOV problem; the journal has been completely filled up. Rick then examines the backup activity log to find out what caused the journal to completely fill up. He sees that control data set (CDS) backups fail each time the BACKVOL CDS command is issued. Since the BACKVOL CDS command is failing, the journal is not getting backed up. Rick also sees that IDCAMS has issued error messages indicating out-of-sequence records in the backup control data set (BCDS). The failures recorded in the backup activity log indicate a “broken” BCDS.

To review the backup activity log, Rick used the DFSMShsm RELEASE HARDCOPY command. HARDCOPY is an optional parameter of the RELEASE command that is used to specify that DFSMShsm close any of the four activity logs if they contain data, and either print the logs or send them to DASD. The DFSMShsm SETSYS ACTLOGTYPE command determines whether the logs are printed or sent to DASD.

When the activity logs are closed, new copies of the logs are allocated for use.

Once Steve had reset the journal, DFSMShsm was able to operate because DFSMShsm addresses the virtual storage access method (VSAM) control intervals directly by key. Backup of the CDS failed because a valid VSAM sequence set is required to read CDSs in key sequence.

What happened to the BCDS? Two MVS™ images, both having DFSMShsm installed, updated the shared BCDS concurrently. When this happened, duplicate records were created in the BCDS, and the index structure of the BCDS was damaged.

After Rick figures out what happened and what damage was done, he plans his salvage operation. Although the journal records were lost when Steve deleted and reallocated the journal, the same information still exists in the BCDS. The BCDS, however, has duplicate records. So the first thing Rick must do is to delete the duplicate records in the BCDS and back up the CDSs. To delete the duplicate records in the BCDS and back up the CDSs, Rick performs the following tasks:

  1. Stops DFSMShsm on all images.
  2. Uses the IDCAMS REPRO command to copy the data component of the BCDS to a sequential file. This can be performed by one of the following methods:
    1. Direct the IDCAMS REPRO command to make a copy of the data component of the BCDS, as in the following example:
      REPRO IDS(HSM.BCDS.DATA) ODS(RICK.BCDS.DATA)
    2. Direct the IDCAMS REPRO command to make a copy of the BCDS by relative byte address, as in the following example:
      REPRO IDS(HSM.BCDS) ODS(RICK.BCDS.DATA) FADDR(0)
  3. Uses DFSORT to perform the following:
    1. Sort the sequential file by key, eliminating records with duplicate keys. The record that was most recently updated, where there are duplicates, is the one that is retained.
    2. Reload the BCDS. The damaged BCDS must be deleted and a new BCDS defined.
  4. Starts DFSMShsm, and issues the DFSMShsm BACKVOL CDS command to create a good copy of the BCDS and nullify the journal.

Rick used the following JCL to accomplish the preceding steps:

//RECOVER JOB
//*******************************************************************
//*PLEASE CHANGE ALL THE FOLLOWING TO THE APPROPRIATE VALUES:
//*?USERID -TO YOUR USER ID
//*?DFSMSHSM -TO THE DFSMSHSM PREFIX FOR YOUR BCDS
//*******************************************************************
//*THE FIRST STEP WILL CREATE A COPY OF ALL OF THE RECORDS CONTAINED
//*IN THE DATA COMPONENT OF THE BCDS.THESE RECORDS WILL NOT BE IN
//*ORDER,AND MAY CONTAIN MULTIPLE RECORDS WITH THE SAME KEY.IN
//*ADDITION,A RECOVERY BCDSWILL BE CREATED IN THIS STEP.THE USER
//*MUST SUPPLY THE PROPER SPACE INFORMATION,VOLUME INFORMATION,AND
//*DATA SET INFORMATION FOR THE DEFINE CLUSTER.
//*******************************************************************
//STEP1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//DDIN1 DD DISP=SHR,DSN=?DFSMSHSM.BCDS.DATA
//DDOUT1 DD DISP=(,CATLG),UNIT=SYSDA,
// DCB=(LRECL=2052,BLKSIZE=20524,RECFM=VB),
// SPACE=(CYL,(50,10),RLSE),DSN=?USERID.BCDS.RECORDS
//SYSIN DD *
 REPRO INFILE(DDIN1)OUTFILE(DDOUT1)FADDR(0)
 DEFINE CLUSTER(NAME(?USERID.RECOVER.BCDS)-
 CYL(004)SPEED BUFFERSPACE(530432)-
 VOL(XXXXXX)-
 CISZ(12288)-
 FREESPACE(0 0)-
 KEYS(44 0)-
 RECORDSIZE(2040 2040))-
 DATA(NAME(?USERID.RECOVER.BCDS.DATA))-
 INDEX(NAME(?USERID.RECOVER.BCDS.INDEX))
/*
//*******************************************************************
//*THE NEXT STEP USES DFSORT AS FOLLOWS:
//*-THE SORT OPERATOR WILL SORT THE RECORDS EXTRACTED FROM
//*THE BCDSDATA COMPONENT.THEY WILL BE SORTED BY THEIR
//*VSAM KEYS,AND BY THE TIME STAMP OF THE LAST UPDATE MADE
//*TO THE RECORD.THE KEYS WILL BE SORTED IN ASCENDING ORDER,
//*AND THE UPDATE TIME IN DESCENDING ORDER;THE MOST CURRENT
//*UPDATED RECORD BEING FIRST IN THE OUTPUT DATA SET.
//*-THE SELECT OPERATOR (NEW IN DFSORT R12)WILL SORT THE
//*RECORDS BY THEIR VSAM KEYS.THE ORDER ESTABLISHED BY THE
//*SORT OPERATOR,WHERE THE MOST RECENTLY UPDATED VERSION OF
//*A DUPLICATE RECORD IS PLACED FIRST IN THE FILE,IS MAINTAINED.
//*THE DUPLICATE RECORD THAT IS NOT THE MOST RECENTLY UPDATED
//*VERSION IS DISCARDED.THE OUTPUT IS LOADED INTO THE RECOVERY
//*COPY OF THE BCDSTHAT WAS DEFINED IN THE FIRST STEP.
//*******************************************************************
//STEP2 EXEC PGM=SORT,REGION=4096K,COND=(0,NE)
//SYSOUT DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=?USERID.BCDS.RECORDS
//SORTOUT DD DISP=(,PASS),DSN=&&TFILE1,
// UNIT=SYSDA,SPACE=(CYL,(50,10),RLSE),
// DCB=(LRECL=2052,BLKSIZE=0,RECFM=VB,DSORG=PS)
//*SORT THE OLD BCDS BY ASCENDING SAM KEY AND
//*DESCENDING TIME STAMP OF LAST UPDATE.
//SYSIN DD *
 SORT FIELDS=(5,44,CH,A,53,8,BI,D)
/*
//STEP3 EXEC PGM=SORT,REGION=4096K,COND=(0,NE)
//SYSOUT DD SYSOUT=*
//SORTIN DD DISP=SHR,DSN=&&TFILE1
//SORTOUT DD DISP=OLD,DSN=?USERID.RECOVER.BCDS
//*LOAD ONLY THE MOST RECENTLY UPDATED RECORDS INTO THE NEW BCDS
//SYSIN      DD *
 SORT FIELDS=(5,44,CH,A),EQUALS
 SUM FIELDS=(NONE)
/*