IBM Support

IBM Spectrum Scale (GPFS): mismatched replicas with possible undetected data corruption following restripe operations (restripefs)

Flashes (Alerts)


Abstract

IBM Spectrum Scale (GPFS): mismatched replicas with possible undetected data corruption following restripe operations (restripefs)

Content

Problem Summary

In a file system where data or metadata replication is used and "rapid repair" is enabled, and when there are "update in place" activities after disk(s) go down, followed by use of the restripefs command with options -r/-R/-m, mismatched replicas may be created after some disks are started up.  Some replicas with stale data could result in metadata corruption in the file system, or data loss.

With mismatched data replicas, it is possible that applications might read outdated (previously overwritten) data . If there are mismatched metadata replicas, operations on metadata like directories, extended attributes, or ACLs might retrieve outdated entries, resulting in unexpected outcome, such as deleted directory entries still remaining present. It is also possible that the problem results in data or metadata blocks getting lost or overwritten with out-of-date data, as disks are stopped and restarted.  Some of these are forms of undetected data corruption.

Users affected

All of the following conditions must apply for a customer to be affected:
1. User is running any of the Spectrum Scale (GPFS) service levels V4.1.1.1 thru V4.1.1.19,  V4.2.0.0 thru V4.2.3.8,  or V5.0.0.0 thru V5.0.1.0.
2. Any file systems have the "rapid repair" option enabled (which is the default)
3. While disks are down in the file system, "update in place" (data overwrite) activities are taking place in that file system
4. Command restripefs with options -r/-m/-R is issued after (3)
5. A down disk is started by invoking the mmchdisk command

One can determine whether a file system has "rapid repair" enabled by issuing the

     mmlsfs <FS>--rapid-repair

command.

To evaluate whether your file system has been affected:

Use the mmrestripefs -c -u  command to check if there are mismatched data or metadata replicas. See an example below with mismatched replicas in the 'barfs' file system.

mmrestripefs barfs -c -u
Scanning user file metadata ...
Inode 399360 [fileset 0, snapshot 0 ] has mismatch in replicated disk address 2:10362880 1:10362880 at block 0

Alternatively, use off-line mmfsck. See  below for an example with mismatched replicas.

mmfsck barfs -c -n
Error in inode 399360 snap 0: Record block 0 has mismatched replicas
 [2:10362880], 1:10362880
 Repair replicas? No

Recommendations

Users running IBM Spectrum Scale V5.0.0.0 through V5.0.1.0 should apply IBM Spectrum Scale V5.0.1.1 or later, available from Fix Central at:  https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.1&platform=All&function=all

2. Users running IBM Spectrum Scale V4.2.0.0 through V4.2.3.8 should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.2.3&platform=All&function=all

3. Users running IBM Spectrum Scale V4.1.1.1 through V4.1.1.19 should apply IBM Spectrum Scale V4.1.1.20 or later, available from Fix Central at: https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%2Bdefined%2Bstorage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=4.1.1&platform=All&function=all

4. If you cannot apply the above PTF levels, contact IBM Service for an efix:

For IBM Spectrum Scale V5.0.0.0 thru V5.0.1.0, reference APAR IJ04123
For IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8, reference APAR IJ04658
For IBM Spectrum Scale V4.1.1.1 thru V4.1.1.19, reference APAR IJ05613

To contact IBM Service, see http://www.ibm.com/planetwide/

Customers who have seen evidence of mismatched replicas, or mount error(s) due to failed log recovery, should also contact IBM Service to run either off-line mmfsck or mmrestripefs to identify and repair possible damage.

 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"4.1.1, 4.2.0, 4.2.1, 4.2.2, 4.2.3, 5.0.0","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"3.5, 4.0, 4.5, 5.0, 5.1, 5.2, 5.3","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ibm10718849