IBM Support

IJ35751: AFM GATEWAY NODES CRASHED

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • ABSTRACT:
    
    AFM GW node crashes.
    
    Error Description:
    
    ? At the moment of panic, the inode_hash_lock was
    acquired by CPU 4 via the function inode_insert5():
    ? ~~~
    ? crash> bt
    ? PID: 145932 TASK: ffff99ad56732100 CPU: 4 COMMAND:
    "tspcachescan"
    ? #0 [ffff99b8bf488e48] crash_nmi_callback at
    ffffffff96c58567
    ? #1 [ffff99b8bf488e58] nmi_handle at ffffffff9738d93c
    ? #2 [ffff99b8bf488eb0] do_nmi at ffffffff9738db5d
    ? #3 [ffff99b8bf488ef0] end_repeat_nmi at
    ffffffff9738cd9c
    ? [exception RIP: native_queued_spin_lock_slowpath+290]
    ? RIP: ffffffff96d17a82 RSP: ffff99ad34f1f0b8 RFLAGS:
    00000246
    ? RAX: 0000000000000000 RBX: ffffbb6616ee0d08 RCX:
    0000000000210000
    ? RDX: ffffffff979888e0 RSI: 000000006c6c6c6c RDI:
    ffff99b4150ba728
    ? RBP: ffff99ad34f1f0b8 R8: ffff99b8bf49b8c0 R9:
    0000000000000000
    ? R10: 6c6c6c6c6c6c6c6c R11: 0000000000000001 R12:
    0000000000000000
    ? R13: ffff99b4150ba6a0 R14: ffff99b4150ba728 R15:
    ffffffffc115df40
    ? ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    ? --- <NMI exception stack> ---
    ? #4 [ffff99ad34f1f0b8] native_queued_spin_lock_slowpath
    at ffffffff96d17a82
    ? #5 [ffff99ad34f1f0c0] queued_spin_lock_slowpath at
    ffffffff9737dd13
    ? #6 [ffff99ad34f1f0d0] _raw_spin_lock at
    ffffffff9738bac0
    ? #7 [ffff99ad34f1f0e0] inode_insert5 at ffffffff96e6ceed
    ? #8 [ffff99ad34f1f128] iget5_locked at ffffffff96e6d310
    ? #9 [ffff99ad34f1f168] nfs_fhget at ffffffffc115f922
    [nfs]
    ? #10 [ffff99ad34f1f1c0] pcache_nfs_iget at
    ffffffffc0e07e26 [mmfslinux]
    ? #11 [ffff99ad34f1f310] pcache_dget_name.constprop.116
    at ffffffffc0e08b9d [mmfslinux]
    ? #12 [ffff99ad34f1f3d0] cxiCacheOpen at ffffffffc0e2474d
    [mmfslinux]
    ? #13 [ffff99ad34f1f588] cxiCacheOpenByAttr at
    ffffffffc0e24a3d [mmfslinux]
    ? #14 [ffff99ad34f1f620]
    _Z20kxPcacheOpenByFileidP13gpfsVfsData_tP15KernelOperatio
    nPv7FileUIDPciPis at ffffffffc100f166 [mmfs26]
    ? #15 [ffff99ad34f1f700]
    _Z9gpfsFattrP13gpfsVfsData_tP9cxiNode_tP9MMFSVInfoiiPvS5_
    P10ext_cred_t at ffffffffc0f447df [mmfs26]
    ? #16 [ffff99ad34f1fb90] tsattr at ffffffffc0e00ef5
    [mmfslinux]
    ? #17 [ffff99ad34f1fd40] _Z8ss_ioctljm at
    ffffffffc1014ef5 [mmfs26]
    ? #18 [ffff99ad34f1fdf0] ss_fs_unlocked_ioctl at
    ffffffffc0dfeaf0 [mmfslinux]
    ? #19 [ffff99ad34f1fe80] do_vfs_ioctl at ffffffff96e63a80
    ? #20 [ffff99ad34f1ff00] sys_ioctl at ffffffff96e63d31
    ? #21 [ffff99ad34f1ff50] system_call_fastpath at
    ffffffff97395f92
    ? RIP: 00007f9152938397 RSP: 00007ffd3223cb20 RFLAGS:
    00000283
    ? RAX: 0000000000000010 RBX: 000000000000000a RCX:
    0000000000000000
    ? RDX: 00007ffd3223ae30 RSI: 0000000000000037 RDI:
    000000000000000a
    ? RBP: 000000000000002b R8: 0000560a07c63ea0 R9:
    00007ffd3223d058
    ? R10: 0000560a07c6f660 R11: 0000000000000246 R12:
    0000560a07e79d20
    ? R13: 00007ffd3223aea0 R14: 0000000000000000 R15:
    0000000000000003
    ? ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
    
    Reported in:
    
    Spectrum Scale release v5.1.1.3 and operating system
    RHEL7.9.
    
    Known Impact:
    
    GPFS down, Node down.
    
    Verification steps:
    
    After applied the fix it should not crash.
    
    Recovery action:
    
    The node will reboot by itself.
    
    Local Fix:
    
    NO
    

Local fix

  • no, the node will reboot.
    

Problem summary

  • AFM gateway node crashes during the fileset recovery because
    invalid file handle are used to get  inodes in the kernel.
    

Problem conclusion

  • This is fixed in 5.1.2  PTF 1
    To see all Spectrum Scale APARs and
    their respective fix solutions refer to page
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    No more AFM gateway node crashes during the fileset recovery
    
    Work around: None
    
    Problem trigger: AFM fileset recovery
    
    Symptom:  Crash
    
    Platforms affected: All Linux OS environments
    
    Functional Area affected:  AFM
    
    Customer Impact: Critical
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ35751

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    511

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2021-10-25

  • Closed date

    2021-11-05

  • Last modified date

    2021-11-05

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"511","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
06 November 2021