APAR status
Closed as program error.
Error description
Application Matlab triggers a hang in the Spectrum Scale client. It occurs when Matlab is instructed to start worker processes using Matlab's "batch()" function. Matlab's worker processes spawn _lots_ of threads when they start. Around 3900 threads will quickly be spawned on each node when the test runs. kernel: MATLAB????????? D ffff8e75e5f626e0???? 0 87792? 87779 0x00000000 kernel: Call Trace: kernel: [<ffffffff9bd89179>] schedule+0x29/0x70 kernel: [<ffffffffc0857cb1>] cxiWaitEventWait+0x1d1/0x2f0 [mmfslinux] kernel: [<ffffffff9b6dadf0>] ? wake_up_state+0x20/0x20 kernel: [<ffffffffc0d6e202>] _Z20DeclareResourceUsagejii+0x572/0x690 [mmfs26] kernel: [<ffffffffc083d3e7>] ? cxiBlockingMutexRelease+0x87/0xf0 [mmfslinux] kernel: [<ffffffffc0d6e43a>] _Z18HoldDaemonSegAndSGP12SegMapStatusPiP13gpfsVfsData_tPP 11StripeGroupS1_Pji16VfsOperationType+0x11a/0x290 [mmfs26] kernel: [<ffffffff9b6e4766>] ? update_curr+0x86/0x1e0 kernel: [<ffffffffc0eb0371>] _ZN15KernelOperation15kBeginVnopRdPudEP13gpfsVfsData_t+0x 81/0x90 [mmfs26] kernel: [<ffffffffc0d9058c>] _Z8gpfsReadP13gpfsVfsData_tP15KernelOperationP9cxiNode_ti P8cxiUio_tP9MMFSVInfoP10cxiVattr_tSA_P10ext_cred_tP14cxiP ageLists_ti+0x1c1c/0x3470 [mmfs26] kernel: [<ffffffff9b62b621>] ? __switch_to+0x151/0x580 kernel: [<ffffffff9bd89179>] ? schedule+0x29/0x70 kernel: [<ffffffff9bd86e41>] ? schedule_timeout+0x221/0x2d0 kernel: [<ffffffffc083d3e7>] ? cxiBlockingMutexRelease+0x87/0xf0 [mmfslinux] kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50 kernel: [<ffffffff9b635c19>] ? sched_clock+0x9/0x10 kernel: [<ffffffff9b6de305>] ? sched_clock_cpu+0x85/0xc0 kernel: [<ffffffff9b6dab32>] ? try_to_wake_up+0x192/0x390 kernel: [<ffffffff9b6dad45>] ? wake_up_process+0x15/0x20 kernel: [<ffffffff9bd88835>] ? __up.isra.0+0x1f/0x2a kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50 kernel: [<ffffffff9b6cbdee>] ? down+0x2e/0x50 kernel: [<ffffffffc0857688>] ? cxiBlockingMutexAcquire+0x208/0x260 [mmfslinux] kernel: [<ffffffffc0d6db47>] ? _Z20ReleaseResourceUsageji+0x3e7/0x470 [mmfs26] kernel: [<ffffffff9b6cbf62>] ? up+0x32/0x50 kernel: [<ffffffffc0864dda>] rdwrInternal+0x45a/0x6c0 [mmfslinux] kernel: [<ffffffff9b84b11a>] ? __check_object_size+0x1ca/0x250 kernel: [<ffffffffc08650b7>] gpfs_f_read+0x77/0xc0 [mmfslinux] kernel: [<ffffffff9b900000>] ? lookup_user_key+0x290/0x500 kernel: [<ffffffff9b84e3df>] vfs_read+0x9f/0x170 kernel: [<ffffffff9b84f25f>] SyS_read+0x7f/0xf0 kernel: [<ffffffff9bd95f92>] system_call_fastpath+0x25/0x2a
Local fix
Contact IBM Support for a workaround
Problem summary
mmap reads from lots of threads may cause a deadlock in DeclareResourceUsage.
Problem conclusion
This problem is fixed in 5.0.5 PTF 13 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scal apars.html Benefits of the solution: No more deadlock Work Around: Disable mmap pagepoolresource usage declaration by the "mmchconfig mmapDeclarePageUsage=false" command Problem trigger: mmap reads from lots of threads Symptom: Hang/Deadlock/Unresponsiveness/Long Waiters Platforms affected: ALL Operating System environments Functional Area affected: All Scale Users Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ36358
Reported component name
SPEC SCALE DME
Reported component ID
5737F34AP
Reported release
505
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-12-02
Closed date
2022-02-24
Last modified date
2022-02-24
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE DME
Fixed component ID
5737F34AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"505","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
25 February 2022