IBM Support

IJ31841: LOGASSERT WHILE RUNNING SNAPSHOTS ON CLUSTER

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • In a Spectrum Scale environment using snapshots, nodes
    might run into a logAssert.
    
    Reported In: Spectrum Scale 5.1.0.2
    
    Error message logged in /var/adm/ras/mmfs.log.latest:
    
    
     [X] *** Assert exp((verify == 0) || (ofP == __null) ||
    (ofP->sgP == __null) || ofP->isRoSnap() ||
    (ofP->metadata.getInodeStatus() != 1) ||
    (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(),
    ofP->getSnapId(), getInodeStatus())) ||
    (ofP->assertInodeWasCopiedToPrevSnapshot()) ||
    (ofP->isBeingRestriped() || ofP->beenRestriped)) in line
    1158 of file
    /project/sprelmax510/build/rmax510s002a/src/avs/fs/mmfs/t
    s/fs/metadata-vfs.C
     [E] *** Traceback:
     [E]         2:0x564B3DE940C3 logAssertFailed + 0x3E3 at
    ??:0
     [E]         3:0x564B3DBF9805
    FileMetadata::setInodeDirtyAndVerify(unsigned int) +
    0x265 at ??:0
     [E]         4:0x564B3DBFC6BD FileMetadata::setTimes(int,
    HiResTime const*, KernelOperation*, dmEventList*) + 0x71D
    at ??:0
     [E]         5:0x564B3EB3974E
    ClientToken::ctRevokeFromClients(CacheObj*, char,
    CopysetRevoke*, int, int*) + 0x95E at ??:0
     [E]         6:0x564B3EB3A81E
    ClientToken::ctAcquire(CacheObj*, char, char*, int) +
    0xABE at ??:0
     [E]         7:0x564B3D972C7F
    LkObj::change_token(CacheObj*, LkObj*,
    LkObj::LockModeEnum, int, LkObj::LockModeEnum*) + 0x1AF
    at ??:0
     [E]         8:0x564B3DC8227E
    InodeLkObj::change_token(CacheObj*, LkObj*,
    LkObj::LockModeEnum, int, LkObj::LockModeEnum*) + 0x5E at
    ??:0
     [E]         9:0x564B3D97890F
    LkObj::change_lock_shark_m(CacheObj*,
    LkObj::LockModeEnum, LkObj::LockModeEnum,
    LkObj::LockModeEnum*, int, int) + 0x5DF at ??:0
     [E]         10:0x564B3D979467
    LkObj::lock_shark_m(CacheObj*, LkObj::LockModeEnum,
    LkObj::LockModeEnum*, int, int) + 0x17 at ??:0
     [E]         11:0x564B3DC26C75
    FileHashTab::fetch(CacheObj*, unsigned short,
    LkObj::LockModeEnum, int, void*, int) + 0x315 at ??:0
     [E]         12:0x564B3D9821E8
    HandleMBHashFetch(MBHashFetchParms*) + 0x148 at ??:0
     [E]         13:0x564B3D972483
    Mailbox::msgHandlerBody(void*) + 0x313 at ??:0
     [E]         14:0x564B3D954058 Thread::callBody(Thread*)
    + 0x118 at ??:0
     [E]         15:0x564B3D941D80
    Thread::callBodyWrapper(Thread*) + 0xC0 at ??:0
     [E]         16:0x7F124B1D6EA5 start_thread + 0xC5 at
    ??:0
     [E]         17:0x7F124A0C48CD __clone + 0x6D at ??:0
    

Local fix

Problem summary

  • When getting the stats of a file, users could run into the
    assert:
     "Assert exp((verify == 0) || (ofP == __null) || (ofP->sgP ==
    __null) ||
    ofP->isRoSnap() || (ofP->metadata.getInodeStatus() != 1) ||
    (!ofP->sgP->isFileIncludedInSnapshot(ofP->getInodeNum(),
    ofP->getSnapId(), getInodeStatus())) ||
    (ofP->assertInodeWasCopiedToPrevSnapshot()) ||
    (ofP->isBeingRestriped() || ofP->beenRestriped)" if there
    are writes to the same file from other nodes.
    

Problem conclusion

  • Benefits of the solution:
    Avoid daemon crash
    
    Work around:
    Run the mmchconfig command to reset the configuration
    "statliteMaxAttrAge=0", which will disable the statlite and
    avoid this problem, but it may also impact the writes
    performance on the other nodes as well.
    
    Problem trigger:
    Getting the lite stat of a file while writes are in
    progress from other nodes.
    
    Symptom: Daemon crash
    
    Platforms affected: All Operating Systems
    
    Functional Area affected:
    Users of gpfs_statlite API or other utilities calling this API.
    
    Customer Impact: High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ31841

  • Reported component name

    SPEC SCALE DME

  • Reported component ID

    5737F34AP

  • Reported release

    510

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-03-29

  • Closed date

    2021-04-21

  • Last modified date

    2021-04-21

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IJ32501

Fix information

  • Fixed component name

    SPEC SCALE DME

  • Fixed component ID

    5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"510","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
05 May 2021