APAR status
Closed as program error.
Error description
In a Spectrum Scale environment using snapshots, nodes might run into a logAssert. Reported In: Spectrum Scale 5.1.0.2 Error message logged in /var/adm/ras/mmfs.log.latest: [X] *** Assert exp((verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)) in line 1158 of file /project/sprelmax510/build/rmax510s002a/src/avs/fs/mmfs/t s/fs/metadata-vfs.C [E] *** Traceback: [E] 2:0x564B3DE940C3 logAssertFailed + 0x3E3 at ??:0 [E] 3:0x564B3DBF9805 FileMetadata::setInodeDirtyAndVerify(unsigned int) + 0x265 at ??:0 [E] 4:0x564B3DBFC6BD FileMetadata::setTimes(int, HiResTime const*, KernelOperation*, dmEventList*) + 0x71D at ??:0 [E] 5:0x564B3EB3974E ClientToken::ctRevokeFromClients(CacheObj*, char, CopysetRevoke*, int, int*) + 0x95E at ??:0 [E] 6:0x564B3EB3A81E ClientToken::ctAcquire(CacheObj*, char, char*, int) + 0xABE at ??:0 [E] 7:0x564B3D972C7F LkObj::change_token(CacheObj*, LkObj*, LkObj::LockModeEnum, int, LkObj::LockModeEnum*) + 0x1AF at ??:0 [E] 8:0x564B3DC8227E InodeLkObj::change_token(CacheObj*, LkObj*, LkObj::LockModeEnum, int, LkObj::LockModeEnum*) + 0x5E at ??:0 [E] 9:0x564B3D97890F LkObj::change_lock_shark_m(CacheObj*, LkObj::LockModeEnum, LkObj::LockModeEnum, LkObj::LockModeEnum*, int, int) + 0x5DF at ??:0 [E] 10:0x564B3D979467 LkObj::lock_shark_m(CacheObj*, LkObj::LockModeEnum, LkObj::LockModeEnum*, int, int) + 0x17 at ??:0 [E] 11:0x564B3DC26C75 FileHashTab::fetch(CacheObj*, unsigned short, LkObj::LockModeEnum, int, void*, int) + 0x315 at ??:0 [E] 12:0x564B3D9821E8 HandleMBHashFetch(MBHashFetchParms*) + 0x148 at ??:0 [E] 13:0x564B3D972483 Mailbox::msgHandlerBody(void*) + 0x313 at ??:0 [E] 14:0x564B3D954058 Thread::callBody(Thread*) + 0x118 at ??:0 [E] 15:0x564B3D941D80 Thread::callBodyWrapper(Thread*) + 0xC0 at ??:0 [E] 16:0x7F124B1D6EA5 start_thread + 0xC5 at ??:0 [E] 17:0x7F124A0C48CD __clone + 0x6D at ??:0
Local fix
Problem summary
When getting the stats of a file, users could run into the assert: "Assert exp((verify == 0) || (ofP == __null) || (ofP->sgP == __null) || ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) || (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(), ofP->getSnapId(), getInodeStatus())) || (ofP->assertInodeWasCopiedToPrevSnapshot()) || (ofP->isBeingRestriped() || ofP->beenRestriped)" if there are writes to the same file from other nodes.
Problem conclusion
Benefits of the solution: Avoid daemon crash Work around: Run the mmchconfig command to reset the configuration "statliteMaxAttrAge=0", which will disable the statlite and avoid this problem, but it may also impact the writes performance on the other nodes as well. Problem trigger: Getting the lite stat of a file while writes are in progress from other nodes. Symptom: Daemon crash Platforms affected: All Operating Systems Functional Area affected: Users of gpfs_statlite API or other utilities calling this API. Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ31841
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
510
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-03-29
Closed date
2021-04-21
Last modified date
2021-04-21
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
05 May 2021