Data integrity
GPFS takes extraordinary care to maintain the integrity of customer data. However, in case of certain hardware failures or unusual circumstances, the occurrence of a programming error can cause the loss of data in a file system.
GPFS performs extensive checking to
validate metadata and ceases by using the file system if metadata becomes inconsistent. This can
appear in two ways:
- The file system is unmounted and applications begin seeing ESTALE return codes to file operations.
- Error log entries that indicate MMFS_SYSTEM_UNMOUNT and a corruption error is generated.
If actual disk data corruption occurs, this error appears on each node in succession. Before
proceeding with the following steps, follow the procedures in Information to be collected before contacting the IBM Support Center,
and then contact the IBM Support Center.
- Examine the error logs on the NSD servers for any indication of a disk error that is reported.
- Take appropriate disk problem determination and repair actions before continuing.
- After completing any required disk repair actions, run the offline version of the mmfsck command on the file system.
- If your error log or disk analysis tool indicates that specific disk blocks are in error, use the mmfileid command to determine which files are on damaged areas of the disk, and then restore these files. For more information, see The mmfileid command.
- If data corruption errors occur in only one node, it is probable that memory structures within
the node are corrupted. In this case, the file system is probably good but a program error exists in
GPFS or another authorized program with
access to GPFS data structures.
Follow the directions in Data integrity and then restart the node. This should clear the problem. If the problem repeats on one node without affecting other nodes check the programming specifications code levels to determine that they are current and compatible, and that no hardware errors were reported. Refer to the IBM Storage Scale: Concepts, Planning, and Installation Guide for correct software levels.