Disk media failure
Recovery procedures to recover lost data in case of disk media failure.
Regardless of whether you have chosen additional hardware or replication
to protect your data against media failures, you first need to determine
that the disk has completely failed. If the disk has completely failed
and it is not the path to the disk which has failed, follow the procedures
defined by your disk vendor. Otherwise:
- Check on the states of the disks for the file system:
GPFS will mark disks down if there have been problems accessing the disk.mmlsdisk fs1 -e
- To prevent any I/O from going to the down disk, issue these commands immediately:
mmchdisk fs1 suspend -d gpfs1nsd mmchdisk fs1 stop -d gpfs1nsd
Note: If there are any GPFS file systems with pending I/O to the down disk, the I/O will timeout if the system administrator does not stop it.To see if there are any threads that have been waiting a long time for I/O to complete, on all nodes issue:mmfsadm dump waiters 10 | grep "I/O completion"
- The next step is
irreversible! Do not run this command unless data and metadata have been replicated. This
command scans file system metadata for disk addresses belonging to the disk in question, then
replaces them with a special
broken disk address
value, which might take a while.CAUTION:Be extremely careful with using the -p option of mmdeldisk, because by design it destroys references to data blocks, making affected blocks unavailable. This is a last-resort tool, to be used when data loss might have already occurred, to salvage the remaining data–which means it cannot take any precautions. If you are not absolutely certain about the state of the file system and the impact of running this command, do not attempt to run it without first contacting the IBM® Support Center.mmdeldisk fs1 gpfs1n12 -p
- Invoke the
mmfileid command with the operand :BROKEN:
mmfileid fs1 -d :BROKEN
For more information, see The mmfileid command.
- After the disk is properly repaired and available for use, you can add it back to the file system.