APAR status
Closed as program error.
Error description
ABSTRACT: AFM GW node crashes. Error Description: ? At the moment of panic, the inode_hash_lock was acquired by CPU 4 via the function inode_insert5(): ? ~~~ ? crash> bt ? PID: 145932 TASK: ffff99ad56732100 CPU: 4 COMMAND: "tspcachescan" ? #0 [ffff99b8bf488e48] crash_nmi_callback at ffffffff96c58567 ? #1 [ffff99b8bf488e58] nmi_handle at ffffffff9738d93c ? #2 [ffff99b8bf488eb0] do_nmi at ffffffff9738db5d ? #3 [ffff99b8bf488ef0] end_repeat_nmi at ffffffff9738cd9c ? [exception RIP: native_queued_spin_lock_slowpath+290] ? RIP: ffffffff96d17a82 RSP: ffff99ad34f1f0b8 RFLAGS: 00000246 ? RAX: 0000000000000000 RBX: ffffbb6616ee0d08 RCX: 0000000000210000 ? RDX: ffffffff979888e0 RSI: 000000006c6c6c6c RDI: ffff99b4150ba728 ? RBP: ffff99ad34f1f0b8 R8: ffff99b8bf49b8c0 R9: 0000000000000000 ? R10: 6c6c6c6c6c6c6c6c R11: 0000000000000001 R12: 0000000000000000 ? R13: ffff99b4150ba6a0 R14: ffff99b4150ba728 R15: ffffffffc115df40 ? ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ? --- <NMI exception stack> --- ? #4 [ffff99ad34f1f0b8] native_queued_spin_lock_slowpath at ffffffff96d17a82 ? #5 [ffff99ad34f1f0c0] queued_spin_lock_slowpath at ffffffff9737dd13 ? #6 [ffff99ad34f1f0d0] _raw_spin_lock at ffffffff9738bac0 ? #7 [ffff99ad34f1f0e0] inode_insert5 at ffffffff96e6ceed ? #8 [ffff99ad34f1f128] iget5_locked at ffffffff96e6d310 ? #9 [ffff99ad34f1f168] nfs_fhget at ffffffffc115f922 [nfs] ? #10 [ffff99ad34f1f1c0] pcache_nfs_iget at ffffffffc0e07e26 [mmfslinux] ? #11 [ffff99ad34f1f310] pcache_dget_name.constprop.116 at ffffffffc0e08b9d [mmfslinux] ? #12 [ffff99ad34f1f3d0] cxiCacheOpen at ffffffffc0e2474d [mmfslinux] ? #13 [ffff99ad34f1f588] cxiCacheOpenByAttr at ffffffffc0e24a3d [mmfslinux] ? #14 [ffff99ad34f1f620] _Z20kxPcacheOpenByFileidP13gpfsVfsData_tP15KernelOperatio nPv7FileUIDPciPis at ffffffffc100f166 [mmfs26] ? #15 [ffff99ad34f1f700] _Z9gpfsFattrP13gpfsVfsData_tP9cxiNode_tP9MMFSVInfoiiPvS5_ P10ext_cred_t at ffffffffc0f447df [mmfs26] ? #16 [ffff99ad34f1fb90] tsattr at ffffffffc0e00ef5 [mmfslinux] ? #17 [ffff99ad34f1fd40] _Z8ss_ioctljm at ffffffffc1014ef5 [mmfs26] ? #18 [ffff99ad34f1fdf0] ss_fs_unlocked_ioctl at ffffffffc0dfeaf0 [mmfslinux] ? #19 [ffff99ad34f1fe80] do_vfs_ioctl at ffffffff96e63a80 ? #20 [ffff99ad34f1ff00] sys_ioctl at ffffffff96e63d31 ? #21 [ffff99ad34f1ff50] system_call_fastpath at ffffffff97395f92 ? RIP: 00007f9152938397 RSP: 00007ffd3223cb20 RFLAGS: 00000283 ? RAX: 0000000000000010 RBX: 000000000000000a RCX: 0000000000000000 ? RDX: 00007ffd3223ae30 RSI: 0000000000000037 RDI: 000000000000000a ? RBP: 000000000000002b R8: 0000560a07c63ea0 R9: 00007ffd3223d058 ? R10: 0000560a07c6f660 R11: 0000000000000246 R12: 0000560a07e79d20 ? R13: 00007ffd3223aea0 R14: 0000000000000000 R15: 0000000000000003 ? ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Reported in: Spectrum Scale release v5.1.1.3 and operating system RHEL7.9. Known Impact: GPFS down, Node down. Verification steps: After applied the fix it should not crash. Recovery action: The node will reboot by itself. Local Fix: NO
Local fix
no, the node will reboot.
Problem summary
AFM gateway node crashes during the fileset recovery because invalid file handle are used to get inodes in the kernel.
Problem conclusion
This is fixed in 5.1.2 PTF 1 To see all Spectrum Scale APARs and their respective fix solutions refer to page https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_ apars.html Benefits of the solution: No more AFM gateway node crashes during the fileset recovery Work around: None Problem trigger: AFM fileset recovery Symptom: Crash Platforms affected: All Linux OS environments Functional Area affected: AFM Customer Impact: Critical
Temporary fix
Comments
APAR Information
APAR number
IJ35751
Reported component name
SPEC SCALE STD
Reported component ID
5737F33AP
Reported release
511
Status
CLOSED PER
PE
NoPE
HIPER
NoHIPER
Special Attention
NoSpecatt / Xsystem
Submitted date
2021-10-25
Closed date
2021-11-05
Last modified date
2021-11-05
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
Fix information
Fixed component name
SPEC SCALE STD
Fixed component ID
5737F33AP
Applicable component levels
[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"511","Line of Business":{"code":"LOB26","label":"Storage"}}]
Document Information
Modified date:
06 November 2021