Flashes (Alerts)
Abstract
IBM has identified a problem with the IBM Spectrum Scale parallel log recovery function at V5.0.4.0 - V5.0.4.1 (ESS 5.3.5 and ESS 6.0.0), which may result in metadata corruption or undetected data corruption during the course of a file system recovery.
Content
Problem Summary:
As a result of incorrect logic during log recovery processing, the same buffer could be unintentionally shared by different threads. This usually results in GPFS daemon assert during log recovery. In rare cases, though, it may result in possible metadata corruption or undetected data corruption. Log recovery happens after a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot).
Users Affected:
This issue affects customers running IBM Spectrum Scale V5.0.4.0 through 5.0.4.1 (ESS 5.3.5 and ESS 6.0.0) running under the following known scenario:
Log recovery is triggered as a result of node failure. When log recovery has to recover many updates to different disk addresses, logic is invoked to process the log file in multiple chunks. Under such circumstances, a logic issue may result in a GPFS daemon assert, which will keep occurring until the log file is deleted using offline mmfsck. In rare cases, it can also lead to metadata corruption or undetected data corruption. Both of the following conditions must be present for the problem to occur:
- Log recovery is initiated by Spectrum Scale as a result of a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot) and
- Log recovery finds many recovery records for different disk addresses as result of heavy file system activity just before the node failure.
One possible symptom of the issue is a daemon assert with the following signature:
logAssertFailed: curSectorHdrP->sectorLsn == nextSectorLsn
Metadata corruptions may be reported as MMFS_FSSTRUCT errors in the system log (dmesg or errpt).
Recommendations:
Any customer running Spectrum Scale V5.0.4.0 or V5.0.4.1 (ESS 5.3.5 and ESS 6.0.0) should either upgrade to Spectrum Scale V5.0.4.2 or later (ESS 5.3.5.1 and ESS 6.0.0.1 or later) available from Fix Central at:
If you cannot apply the latest level of service, contact IBM Service for an efix: APAR IJ22010
To contact IBM Service, see http://www.ibm.com/planetwide/
Was this topic helpful?
Document Information
Modified date:
26 March 2020
UID
ibm11274428