IBM Support

IBM Spectrum Scale (GPFS) 5.0.4 levels Alert: possible metadata or data corruption during file system log recovery

Flashes (Alerts)


Abstract

IBM has identified a problem with the IBM Spectrum Scale parallel log recovery function at V5.0.4.0 - V5.0.4.1 (ESS 5.3.5 and ESS 6.0.0), which may result in metadata corruption or undetected data corruption during the course of a file system recovery.

Content

Problem Summary:

As a result of incorrect logic during log recovery processing, the same buffer could be unintentionally shared by different threads. This usually results in GPFS daemon assert during log recovery. In rare cases, though, it may result in possible metadata corruption or undetected data corruption. Log recovery happens after a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot). 

Users Affected:

This issue affects customers running IBM Spectrum Scale V5.0.4.0 through 5.0.4.1 (ESS 5.3.5 and ESS 6.0.0) running under the following known scenario: 

Log recovery is triggered as a result of node failure. When log recovery has to recover many updates to different disk addresses, logic is invoked to process the log file in multiple chunks. Under such circumstances, a logic issue may result in a GPFS daemon assert, which will keep occurring until the log file is deleted using offline mmfsck. In rare cases, it can also lead to metadata corruption or undetected data corruption.  Both of the following conditions must be present for the problem to occur: 

- Log recovery is initiated by Spectrum Scale as a result of a node failure (daemon assert, expel, quorum loss, kernel panic, or node reboot) and  

- Log recovery finds many recovery records for different disk addresses as result of heavy file system activity just before the node failure.

One possible symptom of the issue is a daemon assert with the following signature:  

logAssertFailed: curSectorHdrP->sectorLsn == nextSectorLsn

Metadata corruptions may be reported as MMFS_FSSTRUCT errors in the system log (dmesg or errpt). 

Recommendations: 

Any customer running Spectrum Scale V5.0.4.0 or V5.0.4.1 (ESS 5.3.5 and ESS 6.0.0) should either upgrade to Spectrum Scale V5.0.4.2 or later (ESS 5.3.5.1 and ESS 6.0.0.1 or later) available from Fix Central at:

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.4&platform=All&function=all

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Elastic+Storage+Server+(ESS)&release=5.3.0&platform=All&function=all

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Elastic+Storage+Server+(ESS)&release=6.0.0&platform=All&function=all

If you cannot apply the latest level of service, contact IBM Service for an efix: APAR IJ22010 

To contact IBM Service, see http://www.ibm.com/planetwide/  

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"5.0","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"5.0","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"STHMCM","label":"IBM Elastic Storage Server"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"5.3.5 and 6.0.0","Edition":"5.3.5.1 and 6.0.0.1","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
26 March 2020

UID

ibm11274428