IBM Support

IJ36358: APPLICATION TRIGGERS HANG IN SPECTRUM SCALE CLIENT

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • Application Matlab triggers a hang in the Spectrum Scale
    client. It occurs when Matlab is instructed to start
    worker processes using Matlab's "batch()" function.
    Matlab's worker processes spawn _lots_ of threads when
    they start. Around 3900 threads will quickly be spawned
    on each node when the test runs.
    
     kernel: MATLAB????????? D ffff8e75e5f626e0???? 0 87792?
    87779 0x00000000
     kernel: Call Trace:
     kernel: [<ffffffff9bd89179>] schedule+0x29/0x70
     kernel: [<ffffffffc0857cb1>]
    cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux]
     kernel: [<ffffffff9b6dadf0>] ? wake_up_state+0x20/0x20
     kernel: [<ffffffffc0d6e202>]
    _Z20DeclareResourceUsagejii+0x572/0x690 [mmfs26]
     kernel: [<ffffffffc083d3e7>] ?
    cxiBlockingMutexRelease+0x87/0xf0 [mmfslinux]
     kernel: [<ffffffffc0d6e43a>]
    _Z18HoldDaemonSegAndSGP12SegMapStatusPiP13gpfsVfsData_tPP
    11StripeGroupS1_Pji16VfsOperationType+0x11a/0x290
    [mmfs26]
     kernel: [<ffffffff9b6e4766>] ? update_curr+0x86/0x1e0
     kernel: [<ffffffffc0eb0371>]
    _ZN15KernelOperation15kBeginVnopRdPudEP13gpfsVfsData_t+0x
    81/0x90 [mmfs26]
     kernel: [<ffffffffc0d9058c>]
    _Z8gpfsReadP13gpfsVfsData_tP15KernelOperationP9cxiNode_ti
    P8cxiUio_tP9MMFSVInfoP10cxiVattr_tSA_P10ext_cred_tP14cxiP
    ageLists_ti+0x1c1c/0x3470 [mmfs26]
     kernel: [<ffffffff9b62b621>] ? __switch_to+0x151/0x580
     kernel: [<ffffffff9bd89179>] ? schedule+0x29/0x70
     kernel: [<ffffffff9bd86e41>] ?
    schedule_timeout+0x221/0x2d0
     kernel: [<ffffffffc083d3e7>] ?
    cxiBlockingMutexRelease+0x87/0xf0 [mmfslinux]
     kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50
     kernel: [<ffffffff9b635c19>] ? sched_clock+0x9/0x10
     kernel: [<ffffffff9b6de305>] ? sched_clock_cpu+0x85/0xc0
     kernel: [<ffffffff9b6dab32>] ?
    try_to_wake_up+0x192/0x390
     kernel: [<ffffffff9b6dad45>] ? wake_up_process+0x15/0x20
     kernel: [<ffffffff9bd88835>] ? __up.isra.0+0x1f/0x2a
     kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50
     kernel: [<ffffffff9b6cbdee>] ? down+0x2e/0x50
     kernel: [<ffffffffc0857688>] ?
    cxiBlockingMutexAcquire+0x208/0x260 [mmfslinux]
     kernel: [<ffffffffc0d6db47>] ?
    _Z20ReleaseResourceUsageji+0x3e7/0x470 [mmfs26]
     kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50
     kernel: [<ffffffffc0864dda>] rdwrInternal+0x45a/0x6c0
    [mmfslinux]
     kernel: [<ffffffff9b84b11a>] ?
    __check_object_size+0x1ca/0x250
     kernel: [<ffffffffc08650b7>] gpfs_f_read+0x77/0xc0
    [mmfslinux]
     kernel: [<ffffffff9b900000>] ?
    lookup_user_key+0x290/0x500
     kernel: [<ffffffff9b84e3df>] vfs_read+0x9f/0x170
     kernel: [<ffffffff9b84f25f>] SyS_read+0x7f/0xf0
     kernel: [<ffffffff9bd95f92>]
    system_call_fastpath+0x25/0x2a
    

Local fix

  • Contact IBM Support for a workaround
    

Problem summary

  • mmap reads from lots of threads may cause a deadlock
    in DeclareResourceUsage.
    

Problem conclusion

  • This problem is fixed in 5.0.5  PTF 13
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scal
    apars.html
    
    Benefits of the solution:
    No more deadlock
    
    Work Around:
    Disable mmap pagepoolresource usage declaration
    by the  "mmchconfig mmapDeclarePageUsage=false"
    command
    
    Problem trigger:
    mmap reads from lots of threads
    
    Symptom:
    Hang/Deadlock/Unresponsiveness/Long Waiters
    
    Platforms affected:
    ALL Operating System environments
    
    Functional Area affected:
    All Scale Users
    
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ36358

  • Reported component name

    SPEC SCALE DME

  • Reported component ID

    5737F34AP

  • Reported release

    505

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-12-02

  • Closed date

    2022-02-24

  • Last modified date

    2022-02-24

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE DME

  • Fixed component ID

    5737F34AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
25 February 2022