IBM Support

IBM Storage Scale V5.1.0.0 to V5.1.9.2: Undetected data and directory corruption: stale data may be read during "mmchdisk start" or left on disk

News


Abstract

IBM has identified potential file system data integrity issues in file systems with data or metadata replication, including undetected data and directory corruption, with IBM Storage Scale V5.1.0.0 to V5.1.9.2 (IBM Storage Scale System V6.1.0.0 to V6.1.9.2). Two different issues have been identified:

1.) Under some conditions, stale data can be read while the command "mmchdisk start" is run on file systems with replication.

2.) Stale data replicas may not get repaired by the "mmchdisk start" command.

Content

It is possible that a disk in use by a file system may be marked as “down”, either as a result of a storage problem, or when the mmchdisk stop command is run. When data is written to a file block, if one of the replicas residing on a down disk cannot be written, the data replica on the down disk becomes stale. The command mmchdisk start can be run to recover a down disk and return it to the up (online) state. In a file system with replication, the command fixes stale data replicas, if any, on the down disks. The down disks transition to the unrecovered state after the mmchdisk start command starts to run, and the disk state changes to the up state after mmchdisk start completes successfully. The mmlsdisk command shows the disk state for a file system. Any unrecovered disks will be listed in the command output.
Every file or directory that is being accessed in IBM Storage Scale is assigned a metanode. The metanode is a IBM Storage Scale role required to coordinate updates to a file or directory. This metanode role is dynamically assigned to a node in the IBM Storage Scale cluster. Additionally this role can move from one node to another for various reasons. All other nodes that access a file or directory, other than the metanode, are defined as non-metanodes. On the non-metanodes, under certain conditions, stale file system data may not be identified accurately, and this leads to the following two issues which exist in IBM Storage Scale from V5.1.0.0 to V5.1.9.2 (IBM Storage Scale System versions from 6.1.0.0 to 6.1.9.2).
Issue 1:
If the system is in use while some disks are in the unrecovered state (that is, while the mmchdisk start command is running), one might read stale data from the unrecovered disks on non-metanodes.
Issue 2:
When the command mmchdisk start is run to recover a down disk, and this disk contains stale data, the possibility exists that the disk recovery will complete without repairing all stale data. Two scenarios increase the likelihood that this issue can occur.
    Scenario 1) I/O workload continues to run while one or more disks are down.
    Scenario 2) Nodes fail under the conditions described in scenario 1 above.
These issues can only occur if either metadata or data is replicated in a file system. For metadata replication the issues only impact directory entries, and no other file system metadata. If data replication is enabled then the issues can impact user data.
The first issue is more likely to be encountered if the IBM Storage Scale "rapid repair" feature is disabled on a file system, while the second issue can only occur if the "rapid repair" feature is enabled on a file system. As “rapid repair” has been enabled by default since before IBM Storage Scale V5.1.0.0, it should be rare for a file system to have the feature disabled.
Problem Determination
Though a "file system struct error" may be triggered by the issues, mostly the stale data will be read silently without showing any other obvious symptoms.
The /var/log/messages file (or the output of the errpt command on AIX) might contain an entry similar to the following:
Error=MMFS_FSSTRUCT, ID=0x94B1F045, Tag=12662454:   Invalid disk data structure.  Error code 1108.
Again, this MMFS_FSSTRUCT entry is generic, and may be insufficient to identify this specific issue.
If processing the syslog (/var/log/messages) or errpt output through the fsstructlx.awk/fsstruct.awk scripts (located in /usr/lpp/mmfs/samples/debugtools/), the presence of one or more of the errors below is a possible indication that the problems described in this flash have been encountered:
FSErrValidate
FSErrBadCompressBlock
FSErrInodeCorrupted
FSErrBadInodeOrGen
FSErrInodeNumMismatch
In case stale data is on disk, resulting in inconsistent replicas, run replica compare commands to detect that. Ensure all disks in a file system are up at the time the command is run. For example,
mmrestripefs <file system> --check-conflicting-replicas
can be run to check for the entire file system, when all disks in the file system are up.
Command:
mmrestripefile -c --read-only <Filename>
will identify mismatched replicas in file <Filename>  in case there is any. This command can be used in the case a file is known, or is suspected, to have been affected by the problem.
Users Affected
These issues affect customers for which both conditions below are true:
  • File system replication (-r 2 or -r 3, or -m 2 or -m 3 in the mmlsfs output) is used
  • IBM Storage Scale V5.1.0.0 to V5.1.9.2 (IBM Storage Scale System versions from 6.1.0.0 to 6.1.9.2) is being used
Recommendations:
Storage Scale customers that are affected (section above) should upgrade to IBM Storage Scale V5.1.9.3 or later:

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Storage+Scale&release=5.1.9&platform=All&function=all

Storage Scale System customers that are affected (section above) should upgrade to IBM Storage Scale System V6.1.9.3 or later: 
For IBM Storage Scale Container Native, both the remotely mounted storage cluster as well as the container native client cluster should be upgraded to IBM Storage Scale V5.1.9.3 or later.
    -  Please follow the general upgrade steps within documentation of IBM Storage Scale Container Native V5.1.9.3 or later. Pay special attention to the table of supported upgrade paths to understand any necessary release steps before arriving at the final level.
    - For the remotely mounted storage cluster, please obtain the code via the above FixCentral link and follow the IBM Storage Scale upgrade documentation
If an upgrade is not possible, customers should contact IBM support and request an efix for this problem. Refer to APARs J50463, IJ50563, and IJ50708.
    - IJ50463 is applicable to 5.1.1.0 up to 5.1.9.2
    - IJ50563 and IJ50708 are applicable to 5.1.0.0 up to 5.1.9.2.
It is recommended that customers running versions lower than 5.1.2.15 upgrade to 5.1.2.15 before applying the efixes. Customers running versions between 5.1.3.0 and 5.1.9.2 (inclusive) are encouraged to upgrade to 5.1.9.3 or later versions — where no efixes will be required.
For products that include IBM Storage Scale, you should contact the product vendor to determine the means to update your environment with the fixes described in this flash. If your environment includes other software that relies on IBM Storage Scale you should work with the providers of that software to assess if it also needs to be upgraded, along with IBM Storage Scale.
Until the fix can be applied
While the affected customers should upgrade their systems as above at the earliest opportunity, refer to the following instructions to reduce the chance of data loss.
The problems described above are more likely to be encountered if production workload continues to run (on a system without the fix) while one or more disks are not in the “up” state.
In case the command mmchdisk start fails (unrelated to the problems described here), "down" disks being started go into the unrecovered state. Bring such disks down again via the mmchdisk stop command to avoid the chance of reading stale data from those disks.
To eliminate the chance of encountering Issue 1 (but not Issue 2), avoid accessing file system data while running the mmchdisk start command. Information for detecting and correcting mismatched replicas can be found at https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=failure-replica-mismatches.

Note: For internal reference D.325578 and D.327676.

[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY","label":"IBM Storage Scale"},"ARM Category":[{"code":"a8m3p000000hAkYAAU","label":"GPFS"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSP944","label":"IBM Storage Scale System"},"ARM Category":[{"code":"a8m50000000KzdsAAC","label":"GPFS"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
21 June 2024

UID

ibm17150282