IBM Support

IJ54593: FS MGR NODE STUCK IN TM TOKEN RECOVERY DUE TO UNRESPONSIVE SGMMS

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • A client node to handle sgmMsgExeTMPhase request could
    cause InodeDeleteThread call into kernel, resulting in a
    loop that introduced a deadlock.
    
    Reported in: 5.1.9
    
    Local Fix: Reboot client node
    

Local fix

Problem summary

  • During token minimization, a deadlock can occur on a client
    node. With token minimization, a client node is first asked to
    give up any tokens that are only for cached files. Without the
    fix, calling this codepath for files that have been deleted,
    could result in a deadlock.
    

Problem conclusion

  • This problem is fixed in 5.1.9.10
    To see all Spectrum Scale APARs and their respective
    Fix solutions refer to page: 
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale
    _apars.html
    
    Benefits of the solution:
    Avoid the deadlock.
    
    Work Around:
    Disable token minimization to avoid the problem: mmchconfig
    tokenXferMinimization=noOr restart GPFS on the client node, to
    get out of the deadlock.
    
    Problem trigger:
    Have many files cached on a client node. Delete files. Trigger a
    token server change, which then uses token minimization.
    
    Symptom:
    Hang/Deadlock/Unresponsiveness/Long Waiters
    
    Platforms affected:
    ALL Linux OS environments
    
    Functional Area affected:
    All Scale Users
    
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ54593

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    519

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2025-05-01

  • Closed date

    2025-06-10

  • Last modified date

    2025-06-10

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"519","Line of Business":{"code":"LOB69","label":"Storage TPS"}}]

Document Information

Modified date:
10 June 2025