APAR status
Closed as program error.
Error description
Two client nodes are working on the same two regions for block deallocations and each client node owns one region of the two and doing the flush for the region it owns, meanwhile, the DeallocHelperThread on each client node is also requesting the ownership for the region owned by the other client node, then the revoke ownership request would be blocked on each other because the two regions are in flushing state but pending for ownership request from each other, thus forms a deadlock.
Local fix
Restart GPFS on the client node showing long waiter on allocMsgTypeRequestOwnership RPC message from DeallocHelperThread.
Problem summary
Two client nodes are working on the same two regions for block deallocations and each client node owns one region of the two and doing the flush for the region it owns, meanwhile, the DeallocHelperThread on each client node is also requesting the ownership for the region owned by the other client node, then the revoke ownership request would be blocked on each other because the two regions are in flushing state but pending for ownership request from each other, thus forms a deadlock.
Problem conclusion
This problem is fixed in 5.1.2.12 To see all Spectrum Scale APARs and their respective Fix solutions refer to page: https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: Avoid deadlock on block deallocations and flushing. Work around: Restart GPFS on the client node showing long waiter on allocMsgTypeRequestOwnership RPC message from DeallocHelperThread. Problem trigger: Users files data block deallocations from at least two different client nodes. Symptom: Deadlock Platforms affected: All Operating Systems Functional Area affected: All Scale Users Customer Impact: High Importance
Temporary fix
Comments
APAR Information
APAR number
IJ47004
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
512
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2023-05-25
Closed date
2023-07-19
Last modified date
2023-07-19
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
20 July 2023