Recovering from disk failure
When a disk hardware failure occurs and an entire unit is lost, you can recover from this situation.
Symptoms
No I/O activity occurs for the affected disk address. Databases and tables that reside on the affected unit are unavailable.Resolving the problem
Operator response:
- Assure that no incomplete I/O requests exist for the failing device.
One way to do this is to force the volume offline by issuing the following z/OS® command,
where xxx is the unit address:
VARY xxx,OFFLINE,FORCE
To check disk status, issue the following command:
D U,DASD,ONLINE
The following console message is displayed after you force a volume offline:
UNIT TYPE STATUS VOLSER VOLSTATE 4B1 3390 O-BOX XTRA02 PRIV/RSDNT
The disk unit is now available for service.
If you previously set the I/O timing interval for the device class, the I/O timing facility terminates all requests that are incomplete at the end of the specified time interval, and you can proceed to the next step without varying the volume offline. You can set the I/O timing interval either through the IECIOSxx z/OS parameter library member or by issuing the following z/OS command:
SETIOS MIH,DEV=devnum,IOTIMING=mm:ss.
- Issue (or request that an authorized operator issue) the following Db2 command
to stop all databases and table spaces that reside on the affected
volume:
-STOP DATABASE(database-name) SPACENAM(space-name)
If the disk unit must be disconnected for repair, stop all databases and table spaces on all volumes in the disk unit.
- Select a spare disk pack, and use ICKDSF to initialize from scratch
a disk unit with a different unit address (yyy)
and the same volume serial number (VOLSER).
// Job //ICKDSF EXEC PGM=ICKDSF //SYSPRINT DD SYSOUT=* //SYSIN DD * REVAL UNITADDRESS(yyy) VERIFY(volser)
If you initialize a 3380 or 3390 volume, use REVAL with the VERIFY parameter to ensure that you initialize the intended volume, or to revalidate the home address of the volume and record 0. Alternatively, use ISMF to initialize the disk unit.
- Issue the following z/OS console
command, where yyy is the new unit address:
VARY yyy,ONLINE
- To check disk status, issue the following command:
D U,DASD,ONLINE
The following console message is displayed:UNIT TYPE STATUS VOLSER VOLSTATE 7D4 3390 O XTRA02 PRIV/RSDNT
- Delete all table spaces (VSAM linear
data sets) from the ICF catalog by issuing the following access method
services command for each one of them, where y is
either I or J:
where nnn is the data set or partition number, left padded by 0 (zero).DELETE catnam.DSNDBC.dbname.tsname.y0001.Annn CLUSTER NOSCRATCH
- For user-managed table spaces, define the VSAM cluster and data
components for the new volume by issuing the access method services DEFINE
CLUSTER command with the same data set name as in the previous step, in the following
format:
The y is I or J, the x is C (for VSAM clusters) or D (for VSAM data components), and znnn is the data set or partition number, left padded by 0 (zero). For more information, see Data set naming conventions.catnam.DSNDBx.dbname.tsname.y0001.znnn
- For a user-defined table space, define the new data set before an attempt to recover it. You can recover table spaces that are defined in storage groups without prior definition.
- Issue the following Db2 command
to start all the appropriate databases and table spaces that were
previously stopped:
-START DATABASE(database-name) SPACENAM(space-name)
- Recover the table spaces by using the Db2 RECOVER utility.