Flashes (Alerts)
Abstract
IBM Spectrum Scale (GPFS): mismatched replicas with possible undetected data corruption following restripe operations (restripefs)
Content
Problem Summary
In a file system where data or metadata replication is used and "rapid repair" is enabled, and when there are "update in place" activities after disk(s) go down, followed by use of the restripefs command with options -r/-R/-m, mismatched replicas may be created after some disks are started up. Some replicas with stale data could result in metadata corruption in the file system, or data loss.
With mismatched data replicas, it is possible that applications might read outdated (previously overwritten) data . If there are mismatched metadata replicas, operations on metadata like directories, extended attributes, or ACLs might retrieve outdated entries, resulting in unexpected outcome, such as deleted directory entries still remaining present. It is also possible that the problem results in data or metadata blocks getting lost or overwritten with out-of-date data, as disks are stopped and restarted. Some of these are forms of undetected data corruption.
Users affected
All of the following conditions must apply for a customer to be affected:
1. User is running any of the Spectrum Scale (GPFS) service levels V4.1.1.1 thru V4.1.1.19, V4.2.0.0 thru V4.2.3.8, or V5.0.0.0 thru V5.0.1.0.
2. Any file systems have the "rapid repair" option enabled (which is the default)
3. While disks are down in the file system, "update in place" (data overwrite) activities are taking place in that file system
4. Command restripefs with options -r/-m/-R is issued after (3)
5. A down disk is started by invoking the mmchdisk command
One can determine whether a file system has "rapid repair" enabled by issuing the
mmlsfs <FS>--rapid-repair
command.
To evaluate whether your file system has been affected:
Use the mmrestripefs -c -u command to check if there are mismatched data or metadata replicas. See an example below with mismatched replicas in the 'barfs' file system.
mmrestripefs barfs -c -u Scanning user file metadata ... Inode 399360 [fileset 0, snapshot 0 ] has mismatch in replicated disk address 2:10362880 1:10362880 at block 0
Alternatively, use off-line mmfsck. See below for an example with mismatched replicas.
mmfsck barfs -c -n Error in inode 399360 snap 0: Record block 0 has mismatched replicas [2:10362880], 1:10362880 Repair replicas? No
Recommendations
Users running IBM Spectrum Scale V5.0.0.0 through V5.0.1.0 should apply IBM Spectrum Scale V5.0.1.1 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all
2. Users running IBM Spectrum Scale V4.2.0.0 through V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.3&platform=All&function=all
3. Users running IBM Spectrum Scale V4.1.1.1 through V4.1.1.19 should apply IBM Spectrum Scale V4.1.1.20 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.1.1&platform=All&function=all
4. If you cannot apply the above PTF levels, contact IBM Service for an efix:
For IBM Spectrum Scale V5.0.0.0 thru V5.0.1.0, reference APAR IJ04123
For IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8, reference APAR IJ04658
For IBM Spectrum Scale V4.1.1.1 thru V4.1.1.19, reference APAR IJ05613
To contact IBM Service, see http://www.ibm.com/planetwide/
Customers who have seen evidence of mismatched replicas, or mount error(s) due to failed log recovery, should also contact IBM Service to run either off-line mmfsck or mmrestripefs to identify and repair possible damage.
Was this topic helpful?
Document Information
Modified date:
26 September 2022
UID
ibm10718849