Flashes (Alerts)
Abstract
IBM has identified a problem in IBM Spectrum Scale (GPFS) V4.2 and V5.0.0 levels, in which combined usage of compression and Local Read Only Cache (LROC) may result in undetected data corruption in regular files.
Content
Problem Summary:
In the processes of either decompressing or truncating compressed files, some data blocks may be de-allocated. A problem has been identified in which the data may be recalled from LROC devices if the data for these de-allocated blocks was stored into LROC devices before de-allocation. As a result of the data being recalled from the LROC devices, data in memory may become corrupted, with potential for the data on disks to also become corrupted. Note that many types of file modifications (e.g., write, punch hole) of the data of compressed files could trigger an on-the-fly transparent uncompression operation, including GPFS's command line or policy interfaces (e.g., mmrestripefs -z, mmchattr --compression no).
Users Affected:
This issue affects customers running any level of IBM Spectrum Scale (GPFS) V4.2 or V5.0.0, when the following conditions are met:
1) LROC is being used (Note that LROC is supported for Linux only, so other OS environments are not affected).
2) Spectrum Scale File Compression is being used for files stored in LROC.
3) The file is a sparse file (it contains holes) before it is compressed.
4) Actions are performed on the file that force it to be partially or completely uncompressed, such as
- Write to the file
- Migrate data to off-line storage
- Restore data from snapshots or from offline storage
- fallocate(2) system call is executed with FALLOC_FL_PUNCH_HOLE flag on Linux
- Certain operations in AFM
- File truncation
- Issuing mmrestripefs -z
- Issuing mmchattr --compression no
Problem Determination:
User applications might see non-zero data when reading sparse regions (holes) of files. Note that:
1) The files could be either completely uncompressed or partially uncompressed.
2) The data corruption can only be seen initially from the node where uncompression was performed, as long as applications did not perform further writes to the sparse regions of the file from other nodes. If the next writes to the sparse regions happen from other nodes after seeing data corruption then users should not see this temporary data corruption from any node any longer. However, if the first writes happen to the sparse regions of the file from the node where uncompression was performed after seeing data corruption, then this in-memory data corruption could be written to disks and then can be consistently seen from all nodes.
Recommendations:
1. Users running IBM Spectrum Scale V5.0.0.0 through V5.0.0.2 on any Linux servers should apply IBM Spectrum Scale V5.0.1 or later, available from Fix Central at:
https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Softw…
2. Users running IBM Spectrum Scale V4.2.0.0 through V4.2.3.8 on any Linux servers should apply IBM Spectrum Scale V4.2.3.9 or later, available from Fix Central at: https://www-945.ibm.com/support/fixcentral/swg/selectFixes?parent=Softw…
3. If you cannot apply the above PTF levels, contact IBM Service for an efix:
For IBM Spectrum Scale V5.0.0.0 thru V5.0.0.2, reference APAR IJ06259
For IBM Spectrum Scale V4.2.0.0 thru V4.2.3.8, reference APAR IJ06252
To contact IBM Service, see http://www.ibm.com/planetwide/
4. Until the fix is applied, users should temporarily disable LROC function to avoid the possibility of data corruption if compression is also being used. Or, users can decompress all the compressed files from one GPFS cluster node while not performing file operations from this node, instead performing all file operations through other nodes. The compression and LROC features should not be used at the same time in a file system until the fix is applied.
Was this topic helpful?
Document Information
Modified date:
26 September 2022
UID
ibm10713659