IBM Support

IBM Spectrum Scale(GPFS) Alert for V4.2 and V5.0 levels: possible silent data corruption may happen on snapshot files.

Flashes (Alerts)


Abstract

IBM has identified an issue in IBM Spectrum Scale (GPFS) V4.2.0.0 through 4.2.3.19 (ESS 4.0 through ESS 5.2.8), and V5.0.0.0 through 5.0.4.2 (ESS 5.3 through ESS 5.3.5.1)levels, in which undetected data loss or corruption may result from incorrect data being read from snapshot files, after a snapshot deletion or while operations involving data copy-on-write to latest snapshot files are in progress.

Content

Issue:
Deletion of the current snapshot file could cause an undetected data loss or corruption if that snapshot file has a small enough inode number to be in the same inode block with the fileset metadata inode, and a previous snapshot has not captured any modification of the file because the current snapshot was created before any modification took place.
In addition, if a data copy-on-write to the latest snapshot file is in progress, and a read of a file from older snapshots to the latest snapshot file occurs concurrently, then the read operation may get unexpected data from the active file system because the snapshot read is redirected to the active file system incorrectly due to a race window between the copy-on-write process and snapshot read operation.
Users Affected:
This issue may affect customers running any level of IBM Spectrum Scale (GPFS) V4.2.0.0 through 4.2.3.19 (ESS 4.0 through ESS 5.2.8), and V5.0.0.0 through V5.0.4.2 (ESS 5.3 through ESS 5.3.5.1), when they use the snapshot functionality.
Users may be affected when all of the following conditions are met:
  1. The Spectrum Scale(GPFS) snapshot functionality is being used.
  2. More than one snapshot has been created for the file system or same fileset.
  3. Either of the choices below is true:
  • Scenario 1; The snapshot file has a small inode number and contains allocated snapshot data blocks, and its inode is in the same inode block with the Fileset metadata inode, and the inode block is not block 0 and needs to be moved into previous snapshot, then the snapshot is deleted (see the determination method in the Problem Determination section below); or
  •  Scenario 2: An update to files in the active file system triggers data copy-on-write to the latest snapshot, while a read from an older snapshot is proceeding concurrently. If the data block being read has not been copied to any newer snapshots, then this snapshot read may be redirected to the active file system unexpectedly instead of the latest snapshot, thus leading to incorrect data being returned.
Problem Determination:
There are no external indicators that this problem may be occurring as no error message is produced by Spectrum Scale (GPFS). The only indication that this problem may be occurring is to determine if the conditions listed in either of the Scenarios listed above are met.
The below commands can be used for Scenario 1 to determine the fileset metadata inode number and whether the inode block containing it is in block 0 or not:
mmlsfs gpfs1 -B
flag                value            description
------------------- ---------------- --------------------------
-B                 4194304           Block size
-i                 4096              Inode size in bytes
-n                 32                Estimated number of nodes that will mount file system
The fileset metadata inode number can be computed generally from the value(aka, valueFor-nOption) retrieved from the "-n" option of mmlsfs command:
         Fileset inode number = valueFor-nOption + 6
The inode block number for fileset metadata inode can be computed from its inode number as below:
        Fileset inode block number = Fileset inode number / inodes per block = (valueFor-nOption + 6) / inodes per block
The "inodes per block" can be computed from the values(aka, valueFor-BOption and valueFor-iOption) retrieved from the "-B' and "-i" options of mmlsfs command:
        inodes per block = valueFor-BOption / valueFor-iOption
The inodes are in the same inode block with fileset metadata inode if the inode number falls in the range below:
       From Fileset inode block number * (valueFor-BOption / valueFor-iOption) to (Fileset inode block number + 1) * (valueFor-BOption / valueFor-iOption)
The above formula can be simplified for inode number ranges within the fileset metadata inode block as below:
     ((valueFor-nOption + 6) / (valueFor-BOption / valueFor-iOption)) * (valueFor-BOption / valueFor-iOption) --
    ((valueFor-nOption + 6) / (valueFor-BOption / valueFor-iOption) + 1) * (valueFor-BOption / valueFor-iOption)
Since the inode block 0 is always copied to a snapshot when creating it, the small inode number case issue does not arise if the fileset metadata inode is in the inode block 0. The problem for snapshot files with small inode numbers will occur only when the (valueFor-nOption + 6) >= (valueFor-BOption / valueFor-iOption) and within the inode number ranges computed as above.
For Scenario 2, there is no definitive way to ascertain if the issue has occurred.  Users may wish to examine information returned by applications that consume Spectrum Scale data  to determine the accuracy of the data, however users are strongly advised to apply the fix described below.
Customers should always use a separate backup mechanism, such as IBM Spectrum Protect or tape storage device to protect their data from data loss/subsequent data corruption.  Users may wish to use such backups predating the file system corruption to restore data to that point. Users should not restore data from snapshots as they may contain data loss or data corruption. 
Live (on disk) data is not affected as long as it is not restored from snapshots. 
Recommendation:
As there are no external indicators that this problem may be occurring, users should use extra caution when using data from Spectrum Scale snapshots created with the affected releases and promptly take steps to determine whether the above listed conditions are true, and if so, proceed without delay to apply the applicable fix. 
1. Users running IBM Spectrum Scale V5.0.0.0 through V5.0.4.2 (ESS 5.3 through ESS 5.3.5.1) levels should apply IBM Spectrum Scale V5.0.4.3 or later (ESS 5.3.5.2 or later).
2. Users running IBM Spectrum Scale V4.2.0.0 through V4.2.3.19(ESS 4.0 through ESS 5.2.8) levels should apply IBM Spectrum Scale V4.2.3.20 or later (ESS 5.2.9 or later).
If you cannot apply the above PTF level, contact IBM service to obtain and apply efix for your code level(s):
  • For IBM Spectrum Scale V5.0.0.0 through 5.0.4.2,  reference IJ24942 and IJ24943
  • For IBM Spectrum Scale V4.2.0.0 through 4.2.3.19, reference IJ22690 and IJ22033

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m50000000KzhkAAC","label":"SNAPSHOTS"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"4.2.0;5.0.0","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"ARM Category":[{"code":"a8m50000000KzdsAAC","label":"GPFS"}],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Version(s)","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
08 June 2020

UID

ibm16213729