Flashes (Alerts)
Abstract
IBM has identified certain issues affecting Active File Management (AFM) and AFM Asynchronous Disaster Recovery (ADR) in IBM Spectrum Scale, which might result in undetected data loss.
Content
AFM might incorrectly delete files at the home or secondary cluster during an AFM recovery, causing some files to be missed at the home or secondary cluster. AFM recovery is triggered if in-memory queue is lost, for example a gateway node restart. AFM performs readdir from the home to detect deleted and renamed files. If home readdir fails after reading some entries, readdir is retried three times without checking to determine whether some entries were already read. This might cause duplicate entries to be logged as part of home list, causing them to be incorrectly treated as hard link remove operations, and this causes AFM to perform a remove operation on those files. A readdir failure during the AFM recovery can happen if there is a network issue between the cache and the home system. Data might be permanently lost if auto eviction is enabled at the cache, and data was evicted for those files, which were already removed at the home by AFM recovery. Auto eviction is enabled by default with the fileset level config option afmEnableAutoEviction.
2021-05-03_15:35:58.135+0530: [I] AFM: /usr/lpp/mmfs/bin/tspcachescan gpfs1 sw1 6 0 1319687074 sw1.afm.27129 3 0 C0A87A0A602CD23B 2 0
2021-05-03_15:35:58.147+0530: [I] AFM: Detecting operations to be recovered...
2021-05-03_15:35:58.158+0530: [I] AFM: Found 1 remove operations...
2021-05-03_15:35:58.158+0530: [I] AFM: Found 2 hard link remove operations...
2021-05-03_15:35:58.158+0530: [I] AFM: Found 1 create operations...
2021-05-03_15:35:58.159+0530: [I] AFM: Found 1 update operation...
2021-05-03_15:35:58.159+0530: [I] AFM: Found 1 local cleanup operation...
2021-05-03_15:35:58.167+0530: [I] AFM: Starting 'queue' operation for fileset 'sw1' in file system 'gpfs1'.
2021-05-03_15:35:58.167+0530: [I] Command: tspcache gpfs1 1 sw1 0 3 1319687074 49 0 133 0
2021-05-03_15:35:58.226+0530: [I] Command: successful tspcache gpfs1 1 sw1 0 3 1319687074 49 0 133 0
mmapplypolicy <path> -P <policy file path> -f <output file path> -L 1 -N mount -I defer
Any users seeing the incorrect "hard link remove operations" message in the mmfs.log file, apply the fix on AFM gateway nodes at cache or primary cluster.
1. Users running IBM Spectrum Scale V5.0.0.0 through V5.0.5.7, should apply IBM Spectrum Scale V5.0.5.8 or later, available from Fix Central at:
2. Users running IBM Spectrum Scale V5.1.0.0 through V5.1.1.0, should apply IBM Spectrum Scale V5.1.1.1 or later, available from Fix Central at:
3. If you cannot apply one of the above PTF levels, contact IBM Service to obtain and apply an efix for your level of code:
- For IBM Spectrum Scale V5.0.0.0 thru V5.0.5.7, reference APAR IJ32504
- For IBM Spectrum Scale V5.1.0.0 thru V5.1.1.0, reference APAR IJ32481
mmafmctl device resync -j fileset.
mmafmctl device changeSecondary -j fileset --new-target existingAfmTarget --inband
<Fileset Junction Path>/.ptrash directory. You can copy data from the .ptrash directory to the fileset. This synchronizes data between cache (or primary) and home (or secondary).
Was this topic helpful?
Document Information
Modified date:
19 July 2021
UID
ibm16452089