IBM Support

IBM Spectrum Scale (GPFS) 5.0 levels: reading files compressed with lz4 may result in daemon or kernel crashes, or possible undetected data corruption

Flashes (Alerts)


Abstract

IBM has identified an issue in IBM Spectrum Scale (GPFS) V5.0.0 through 5.0.2.3 levels, in which reading files compressed with lz4 from a node running V5.0.0 through 5.0.1.2 may result in daemon or kernel crashes. Attempts to uncompress the files may also result in errors or crashes (V5.0.0 through 5.0.1.2) or file system structure errors (5.0.2.0 through 5.0.2.3). Reading such compressed files might also result in undetected data corruption across any of the file systems.

Content

Problem Summary:
To be able to handle reads from compressed files, the mmfs kernel module or the mmfsd daemon needs to allocate temporary memory to store the compressed data. An issue introduced in 5.0.0 may result in a compression ratio threshold being missed while performing compression with the lz4 library, possibly resulting in compressed partitions being larger than the size of the uncompressed partitions. Logic used in both the daemon and kernel to read compressed data did not take into account that compressed partitions could exceed 32KB (the size of the uncompressed partition), which may cause memory in the shared segment or the heap memory of the mmfsd daemon to be overwritten. The outcome of the memory being overwritten is unpredictable, with a crash in the kernel or the daemon being among the consequences which have been observed.  
The nature of the issue is such that the outcome is different for nodes running V5.0.0 thru 5.0.1.2, nodes running 5.0.2.0 through 5.0.2.3, and for nodes which include the fix listed below.
Attempts to uncompress the files (using commands mmchattr, mmrestripefs, or mmapplypolicy) may also result in errors or crashes in nodes running V5.0.0 through 5.0.1.2. In nodes running 5.0.2.0 through 5.0.2.3, attempts to uncompress such files may result in file system structure errors.
Reading or uncompressing such compressed files might result in undetected data corruption across any of the file systems, since the affected areas in memory might result in incorrect data being written to disk.
A note on lz4 compression and use of disk space:
The occurrence of compressed partitions that increase in size with compression does not cause any compressed files to increase in size -- so no storage is wasted because of this issue.  A compression-ratio check is enforced at the compression group level (10 file system data blocks), and compression groups that do not meet the compression ratio threshold (> 10% space saving) are kept in uncompressed form.  This compression ratio check is not affected by this issue.  

Users Affected:

This issue affects customers running IBM Spectrum Scale (GPFS) V5.0.0 through 5.0.2.3, when the following conditions are all met (for the code levels that apply):
For V5.0.0 through 5.0.1.2:
1. Files are compressed with the lz4 library. Whether a given file is affected depends on its pattern of compressible and uncompressible 32-KB partitions.
2. The node performing compression is running Spectrum Scale V5.0.0 thru 5.0.1.2.
3. A read is performed from such compressed files  or  an uncompression operation is attempted on such compressed files.
Kernel or daemon crashes are possible when either reading or uncompressing such files.

For V5.0.2 through 5.0.2.3:
Note that, while files compressed with V5.0.2 are not affected by this issue, versions of V5.0.2 which do not contain the fix may still be subjected to the problem while reading (or uncompressing) files compressed by nodes running V5.0.0 through 5.0.1.2.
1. Files are compressed with the lz4 library and the files or portion of the files have a mixed sequence of compressible and uncompressible 32-KB partitions.
2. The node performing compression is running Spectrum Scale V5.0.0 through 5.0.1.2.
3.  A read is performed from such compressed files  or  an uncompression operation is attempted on such compressed files on nodes running 5.0.2 through 5.0.2.3. For uncompression, refer to the details below.

If conditions 1 and 2 above are met, and then, on a node running with V5.0.2 through 5.0.2.3, an attempt is made to uncompress the data, for example using:
     mmchattr --compression no
then no daemon/kernel crash is expected; the uncompression operation will fail with "I/O error", and file system structure error entries may appear in the system log. One example of such file system structure error which may be produced to /var/log/messages  is the following:
Feb 22 10:51:24 c80f3m5n11 mmfs: Error=MMFS_FSSTRUCT, ID=0x94B1F045,
Tag=-9430404: Invalid disk data structure. Error code 1133. Volume fs0
A command such as the following can be used to retrieve more details on the  file system structure error:
# /usr/lpp/mmfs/samples/debugtools/fsstructlx.awk /var/log/messages
02/22@10:51:24 c80f3m5n11 FSSTRUCT fs0 133 FSErrBadCompressBlock inodeNum=0000000000007E22 snapId=00000000 blockNum=0000000000000008 winattr=0003 illCompressed=0000 errCode=000008AE
If you are running Spectrum Scale V5.0.2.0 through V5.0.2.3 and no files have ever been compressed with lz4 while nodes were running V5.0.0 through V5.0.1.2 then compressing, reading, and uncompressing files with lz4 is safe.

Problem Determination:

One possible symptom of the issue is a kernel or a daemon assert with the following signatures:

Assertion failed: thisHeaderP->isInUse(), file
../../../../../../../src/avs/fs/mmfs/ts/pagemgr/kMalloc.C
Assertion failed: thisHeaderP->getPoolId() == poolId, file
../../../../../../../src/avs/fs/mmfs/ts/pagemgr/kMalloc.C

Another possibility is a daemon assert similar to the following:

/usr/lpp/mmfs/bin/mmfsd': malloc(): memory corruption: 0x000003fe700f5d70 ***.
Recommendations:
1. Users running IBM Spectrum Scale V5.0.0.0 through V5.0.2.3 should apply IBM Spectrum Scale V5.0.3 or later, available from Fix Central at:  
2. If you cannot apply the above PTF level, contact IBM service to obtain and apply an efix for your code level(s):
IBM Spectrum Scale V5.0.0.0 through V5.0.2.3, reference APAR IJ14104 .
3. Until the fix is applied, users should stop using files which were compressed with lz4 on nodes running V5.0.0 through 5.0.1.2. If any nodes have used code levels 5.0.0.0 through 5.0.1.2 while files were compressed with lz4, avoid accessing any files until the recommended fix levels have been applied.

While files compressed with V5.0.2.0 through V5.0.2.3 should not be affected, if the file system has files compressed with both (1) V5.0.0 through 5.0.1.2 and (2) 5.0.2.0 through V5.0.2.3  then daemon/kernel crashes are still possible when attempting to read the files which were compressed with V5.0.0 through 5.0.1.2.

Users who intend to compress files with the lz4 library should apply V5.0.2.0 or later code level.  In addition, users that are affected by the problem (conditions 1 and 2 under 'Users Affected:') should install the fix above.

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"5.0.0;5.0.1;5.0.2","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.3.2","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 September 2022

UID

ibm10875682