IBM Support

IBM Spectrum Scale Active File Management (AFM) issues which may result in undetected data corruption.

Flashes (Alerts)


Abstract

IBM has identified issues affecting Active File Management (AFM) in IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, which may result in undetected data corruption.

Content

IBM has identified issues affecting Active File Management (AFM)  in IBM Spectrum Scale V5.0.0.0 through V5.0.4.1which can result in undetected data corruption.

1. AFM may intermittently read files from the home cluster incorrectly if the file is sparse at the home cluster, which may result in undetected data corruption. 

Problem Summary:
As a result of incorrect calculation of the number of data blocks allocated to a file when the cache file system block size is larger than the home file system data block size, AFM caches the file without reading the whole file. Applications may read unexpected data when reading the whole file from the home cluster, or undetected data corruption may occur. Data read by applications may be corrupted (possibly reading all zeros), and no error will be returned by the system call used to read the data. Applications may get the data as all zeros, and they may fail if they expect some data.

Users affected:
1. AFM caching is running on IBM Spectrum Scale V5.0.0.0 through V5.0.4.1, and
2. The cache file system data block size is larger than the home file system block size, and

3. The home cluster is enabled for AFM (the mmafmconfig command was executed), and
4. The afmReadSparseThreshold file configuration parameter is enabled at the cache cluster, and the file size exceeds the value of the afmReadSparseThreshold configuration parameter. 

Recommendations:
- Any user meeting all conditions above should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service: 

 
Users running IBM Spectrum Scale V5.0.0.0 through V5.0.4.1 should either upgrade to Spectrum Scale V5.0.4.2 or later available from Fix central at:

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.4&platform=All&function=all

 If you cannot apply the latest level of service, contact IBM Service to obtain and apply an efix, reference APAR IJ21975

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance. 

2.) AFM may intermittently read files from the home cluster incorrectly if the uncached file is updated at the cache cluster at the end of the file without O_APPEND flag, which may result in undetected data corruption.

Problem Summary:
Data may be incompletely read in the AFM independent-writer mode fileset if the file is modified before it is cached. AFM reads the full file from the home before allowing the write operation, but the data append is allowed without AFM reading the full file. If the data is updated at the cache cluster at the end of the file without append mode, the last block of the file might be incompletely read from the home cluster. During the migration, cache cluster is the new system and home cluster is the old system  

Users affected:
1. AFM caching is running on IBM Spectrum Scale V5.0.0.0  through V5.0.4.1, and
2. Users migrating the data using the independent-writer mode with the cache cluster being the new system and home being the old system or users running independent-writer mode caching  

Recommendations:
- Any user meeting all conditions above should either upgrade to a level of code containing the fix, or obtain and apply an efix for their level of code by contacting IBM Service:

Users running IBM Spectrum Scale V5.0.0.0 through V5.0.4.1 should either upgrade to Spectrum Scale V5.0.4.2 or later available from Fix central at:

https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Spectrum+Scale&release=5.0.4&platform=All&function=all

 If you cannot apply the latest level of service, contact IBM Service to obtain and apply an efix, reference APAR IJ20948.

- Prefetch the file again from the home cluster after the application detects the loss of data. 

- After the problem is detected, run a policy to find all the uncached files and prefetch them first. If all the files are already cached, compare the file checksums between the cache and home clusters and prefetch the mismatched files again. AFM migration procedure can be found at: 
https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.4/com.ibm.spectrum.scale.v5r04.doc/b1lins_uc_migrafromlegacyhardware.htm

- If you believe that your GPFS file system may be affected by this issue, please contact IBM Service as soon as possible for further guidance and assistance. 

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"}],"Version":"5.0.0 - 5.0.4.1","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
31 January 2020

UID

ibm11172272