Flashes (Alerts)
Abstract
IBM has identified an issue in IBM Storage Scale ECE V5.1.9.0 through V5.1.9.2 and ESS 6.1.8.0 through 6.1.9.1. The issue is due to a race condition involving a workload of large read and small write operations in the GNR layer when the configuration parameter nsdRAIDClientOnlyChecksum is enabled.
Content
The race condition could result in a GPFS daemon assert, following messages indicating certain data sectors are corrupted. In this case, the data corruption message is a false positive and is merely a result of memory corruption detected by the checksum mechanism, and the assert ensures the GPFS process stops.
Users Affected:
This issue may affect clients running ECE or ESS with the following levels:
- IBM Spectrum Scale ECE V5.1.9.0 through V5.1.9.2
- ESS 6.1.8.0 through 6.1.9.1
The environment configuration parameter nsdRAIDClientOnlyChecksum is also enabled.
Problem determination:
When this problem occurs, the GPFS daemon will assert. The /var/adm/ras/mmfs.log.latest log normally will display something similar to the following messages including the lines immediately before the assert:
2024-03-14_07:05: [E] Error validating trailer version in vdisk RG001LG003VS001 vtrack 32312903 data segment 124 pdisk e1s034 psector 10924902225 vsector 412624114684.
2024-03-14_07:05: [E] Error validating trailer version in vdisk RG001LG003VS001 vtrack 32312903 data segment 125 pdisk e1s034 psector 10924902288 vsector 412624114748.
2024-03-14_07:05: [E] Error validating trailer version in vdisk RG001LG003VS001 vtrack 32312903 data segment 126 pdisk e1s034 psector 10924902352 vsector 412624114812.
2024-03-14_07:05: [E] Error validating trailer version in vdisk RG001LG003VS001 vtrack 32312903 data segment 127 pdisk e1s034 psector 10924902416 vsector 412624114876.
2024-03-14_07:05: [X] logAssertFailed: vtBufNotValidated.getBit(index) != 0
2024-03-14_07:05: [X] return code 0, reason code 0, log record tag 0
2024-03-14_07:05: [X] *** Assert exp(vtBufNotValidated.getBit(index) != 0) in line 2700 of file /project/sprelgpfs518/build/rgpfs518ptf2efix5/src/avs/fs/mmfs/ts/vdisk/vtrackBuf.C
2024-03-14_07:05: [E] *** Traceback:
2024-03-14_07:05: [E] 2:0x55D6A31676AA logAssertFailed + 0x3AA at ??:0
2024-03-14_07:05: [E] 3:0x55D6A360B9C8 VTrackDesc::deleteNotValidatedDiskBuffer(int, VIORequest*) + 0x378 at ??:0
2024-03-14_07:05: [E] 4:0x55D6A35FEF3D VTrackDesc::vtEnforceTrailerValidation(VIORequest*, BufBitmap*) + 0x1DD at ??:0
2024-03-14_07:05: [E] 5:0x55D6A36583A2 VIORequest::vioPerformRequest() + 0x192 at ??:0
2024-03-14_07:05: [E] 6:0x55D6A36597BD VdiskWriteFromBuffer(VIORequest*, NsdCksumTypes, OtherEndUInt64 const*, int) + 0x3CD at ??:0
2024-03-14_07:05: [E] 7:0x55D6A3C81DE7 NsdRequest::processMsgWrite(NsdServerDisk*, NsdBuffer*, int, unsigned int*) + 0xC47 at ??:0
2024-03-14_07:05: [E] 8:0x55D6A3C8F626 NsdRequest::processRequest(NsdBuffer*, NsdQueue*) + 0xA76 at ??:0
2024-03-14_07:05: [E] 9:0x55D6A3C8FB1A nsdWorkerThread(void*) + 0x3AA at ??:0
2024-03-14_07:05: [E] 10:0x55D6A2C26D32 Thread::callBody(Thread*) + 0x42 at ??:0
2024-03-14_07:05: [E] 11:0x55D6A2C13D50 Thread::callBodyWrapper(Thread*) + 0xA0 at ??:0
2024-03-14_07:05: [E] 12:0x7F8D9A3421CF start_thread + 0xEF at ??:0
2024-03-14_07:05: [E] 13:0x7F8D99058DD3 __GI___clone + 0x43 at ??:0
mmfsd: /project/sprelgpfs518/build/rgpfs518ptf2efix5/src/avs/fs/mmfs/ts/vdisk/vtrackBuf.C:2700: void logAssertFailed(UInt32, const char*, UInt32, Int32, Int32, UInt32, const char*, const char*): Assertion `vtBufNotValidated.getBit(index) != 0' failed.
Code Levels Impacted:
IBM Storage Scale ECE V5.1.9.0 through V5.1.9.2
or
ESS 6.1.8.0 through ESS 6.1.9.1.
Recommendations:
IBM Storage Scale ECE customers who are affected should upgrade to IBM Storage Scale 5.1.9.3 or later versions.
IBM Storage Scale System (ESS) customers are recommended to upgrade to ESS 6.1.9.2:
https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Storage+Scale+System&release=6.1.9&platform=All&function=all
https://www.ibm.com/support/fixcentral/swg/selectFixes?parent=Software%20defined%20storage&product=ibm/StorageSoftware/IBM+Storage+Scale+System&release=6.1.9&platform=All&function=all
If an upgrade is not possible, customers should contact IBM Support and request an efix for this problem.
- IBM Storage Scale V5.1.9.0 - V5.1.9.2, APAR: IJ50519
- IBM ESS V6.1.8.0 - V6.1.9.1, APAR: IJ50519
If an efix cannot be immediately applied, it is possible to avoid the assert by disabling the configuration parameter nsdRAIDClientOnlyChecksum (there will be a potential performance loss).
mmchconfig nsdRAIDClientOnlyChecksum=no -i -N <server nodes or nodeclass>
Note: Internal reference D.326165.
[{"Type":"MASTER","Line of Business":{"code":"LOB69","label":"Storage TPS"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY","label":"IBM Storage Scale"},"ARM Category":[{"code":"a8m3p000000PC3DAAW","label":"ECE"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"5.1.9"}]
Was this topic helpful?
Document Information
Modified date:
30 April 2024
UID
ibm17144216