IBM Support

IBM Spectrum Scale Alert : All supported versions may be affected by an issue with online mmfsck which may result in file system corruption.

Flashes (Alerts)


Abstract

IBM has identified a suspected issue in all supported versions of IBM Spectrum Scale in which the execution of online mmfsck with the -y option may result in file system corruption, specifically duplicate allocation of data blocks. File system corruption of this type may result in undetected file data corruption. Note, this problem may also exist in older unsupported versions of Spectrum Scale but that has not been confirmed.

Content

Problem Summary:
Duplicate allocation of a data block results in file data corruption because more than one file has been assigned the data block. Therefore any alteration of the data in that block will affect both files, producing unexpected results. The problem described here is only seen when running online mmfsck, -o option, along with the use of the -y option. A possible contributing factor is inode file expansion occurring while the online mmfsck command is executing. Note that inode file expansion happens automatically as new files are created in the file system. Offline fsck can safely detect and fix the duplicate block references by deleting the references from the affected files.
Note: Continued use of an affected file system with duplicate references may lead to further propagation of the duplicate reference corruption.
Users Affected:
This issue may affect customers running any of the supported versions of IBM Spectrum Scale (5.0.x and 5.1.x) when running mmfsck with the -o and -y options. Running offline mmfsck, that is with the file system unmounted, is not impacted by this problem.
The duplicate reference corruption issue could result in user files experiencing undetected data loss, data corruption, or FSErrDeallocBlock file system structure error(s), indicating disk address double de-allocation. Here is an example of the structure error message that is written to the system log.
May 5 14:50:19 mmfs: Error=MMFS_FSSTRUCT, ID=0x94B1F045,
Tag=9391882: Invalid disk data structure. Error code 1107. Volume<file_system_name>
Problem Determination:
The problem may be present if any of the following indicators exist:
  • Administrators see MMFS_FSSTRUCT errors in the system log.
  • The Spectrum Scale health monitoring feature reports that any file system has fsstruct errors. Here is an example message generated by the Spectrum Scale health monitoring.
2021-07-06_19:23:51.425-0400: [I] Calling user exit script mmFsstructErr: event fsstruct, Async command /usr/lpp/mmfs/lib/mmsysmon/sendRasEventToMonitor.
  • Users report that some data in their files appears to be incorrect.
Recommendations:
Customers should not run the mmfsck command with the -o and -y options until further notice. Please contact IBM support for further instructions as to how the situation can be remedied.

[{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"ARM Category":[{"code":"a8m50000000KzgwAAC","label":"File System"},{"code":"a8m50000000KzgwAAC","label":"File System"},{"code":"a8m50000000KzgwAAC","label":"File System"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF033","label":"Windows"}],"Version":"5.0.0;5.1.0"},{"Type":"MASTER","Line of Business":{"code":"LOB26","label":"Storage"},"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"ARM Category":[{"code":"a8m50000000KzeRAAS","label":"File system corruption"}],"Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.3.0;6.0.0;6.1.0"}]

Document Information

Modified date:
30 July 2021

UID

ibm16474135