IBM Support

IJ47409: RESOLVING NFS-GANESHA CRASH AT NFS_CLIENT_ID_EXPIRE AND NLM_GRANTED_CALLBACK

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • This APAR addresses two issues related to NFS-Ganesha that can
    cause crashes.
    
    Issue 1:
    
    NFS-Ganesha may crash with the following stack trace:
    
    (gdb) bt
    (gdb) bt
    #0  0x00003fffa73e52e8 in raise () from /lib64/libpthread.so.0
    #1  0x00003fffa7954628 in crash_handler (signo=6,
    info=0x3ffefac4a468, ctx=0x3ffefac496f0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/MainNFSD/nfs_ini
    t.c:247
    #2  <signal handler called>
    #3  0x00003fffa717fcb0 in raise () from /lib64/libc.so.6
    #4  0x00003fffa718200c in abort () from /lib64/libc.so.6
    #5  0x00003fffa79b9fd4 in free_client_record
    (record=0x3fff200ed130) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:1381
    #6  0x00003fffa79ba3d8 in dec_client_record_ref
    (record=0x3fff200ed130) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:1461
    #7  0x00003fffa79b825c in nfs_client_id_expire
    (clientid=0x3fff200edbd0, make_stale=false)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:914
    #8  0x00003fffa79c7820 in reserve_lease_or_expire
    (clientid=0x3fff200edbd0, update=true)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_lease.c
    :181
    #9  0x00003fffa7a59db4 in nfs4_op_renew (op=0x3fff029152d0,
    data=0x3fff0320d9c0, resp=0x3ffee960cab0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_op_renew.c:91
    #10 0x00003fffa7a2ed80 in process_one_op (data=0x3fff0320d9c0,
    status=0x3ffefac4cfd0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_Compound.c:920
    #11 0x00003fffa7a30010 in nfs4_Compound (arg=0x3ffeeabd84a0,
    req=0x3ffeeabd7c90, res=0x3ffee9854f60)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_Compound.c:1327
    #12 0x00003fffa794dae4 in nfs_rpc_process_request
    (reqdata=0x3ffeeabd7c90)
    
    Issue 2:
    NFS-Ganesha may crash with the following stack trace:
    #0  0x00007f27f0a984fb in raise () from /lib64/libpthread.so.0
    #1  0x00007f27f2775d7b in crash_handler (signo=11,
    info=0x7f20e337e930, ctx=0x7f20e337e800) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/MainNFSD/nfs_init.c:247
    #2  <signal handler called>
    #3  0x00007f27f28a3cf5 in nlm_granted_callback
    (obj=0x7f2430001378, lock_entry=0x7f2204302c20) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/Protocols/NLM/nlm_util.
    c:609
    #4  0x00007f27f27b133b in try_to_grant_lock
    (lock_entry=0x7f2204302c20) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_lock.c:1732
    #5  0x00007f27f27b177b in process_blocked_lock_upcall
    (block_data=0x7f2204305510) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_lock.c:1780
    #6  0x00007f27f27ac19c in state_blocked_lock_caller
    (ctx=0x7f21c8408650) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_async.c:81
    #7  0x00007f27f27f62bd in fridgethr_start_routine
    (arg=0x7f21c8408650) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/support/fridgethr.c:556
    #8  0x00007f27f0a90ea5 in start_thread () from
    /lib64/libpthread.so.0
    #9  0x00007f27f018fb0d in clone () from /lib64/libc.so.6
    

Local fix

Problem summary

  • This APAR addresses two issues related to NFS-Ganesha that can
    cause crashes. Here are the details:
    
    Issue 1:
    
    NFS-Ganesha may crash with the following stack trace:
    (gdb) bt
    (gdb) bt
    #0  0x00003fffa73e52e8 in raise () from /lib64/libpthread.so.0
    #1  0x00003fffa7954628 in crash_handler (signo=6,
    info=0x3ffefac4a468, ctx=0x3ffefac496f0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/MainNFSD/nfs_ini
    t.c:247
    #2  <signal handler called>
    #3  0x00003fffa717fcb0 in raise () from /lib64/libc.so.6
    #4  0x00003fffa718200c in abort () from /lib64/libc.so.6
    #5  0x00003fffa79b9fd4 in free_client_record
    (record=0x3fff200ed130) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:1381
    #6  0x00003fffa79ba3d8 in dec_client_record_ref
    (record=0x3fff200ed130) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:1461
    #7  0x00003fffa79b825c in nfs_client_id_expire
    (clientid=0x3fff200edbd0, make_stale=false)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_clienti
    d.c:914
    #8  0x00003fffa79c7820 in reserve_lease_or_expire
    (clientid=0x3fff200edbd0, update=true)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/SAL/nfs4_lease.c
    :181
    #9  0x00003fffa7a59db4 in nfs4_op_renew (op=0x3fff029152d0,
    data=0x3fff0320d9c0, resp=0x3ffee960cab0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_op_renew.c:91
    #10 0x00003fffa7a2ed80 in process_one_op (data=0x3fff0320d9c0,
    status=0x3ffefac4cfd0)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_Compound.c:920
    #11 0x00003fffa7a30010 in nfs4_Compound (arg=0x3ffeeabd84a0,
    req=0x3ffeeabd7c90, res=0x3ffee9854f60)
        at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21.308708/Protocols/NFS/nf
    s4_Compound.c:1327
    #12 0x00003fffa794dae4 in nfs_rpc_process_request
    (reqdata=0x3ffeeabd7c90)
    
    Issue 2:
    NFS-Ganesha may crash with the following stack trace:
    
    #0  0x00007f27f0a984fb in raise () from /lib64/libpthread.so.0
    #1  0x00007f27f2775d7b in crash_handler (signo=11,
    info=0x7f20e337e930, ctx=0x7f20e337e800) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/MainNFSD/nfs_init.c:247
    #2  <signal handler called>
    #3  0x00007f27f28a3cf5 in nlm_granted_callback
    (obj=0x7f2430001378, lock_entry=0x7f2204302c20) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/Protocols/NLM/nlm_util.
    c:609
    #4  0x00007f27f27b133b in try_to_grant_lock
    (lock_entry=0x7f2204302c20) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_lock.c:1732
    #5  0x00007f27f27b177b in process_blocked_lock_upcall
    (block_data=0x7f2204305510) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_lock.c:1780
    #6  0x00007f27f27ac19c in state_blocked_lock_caller
    (ctx=0x7f21c8408650) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/SAL/state_async.c:81
    #7  0x00007f27f27f62bd in fridgethr_start_routine
    (arg=0x7f21c8408650) at
    /usr/src/debug/nfs-ganesha-3.5-ibm071.21/support/fridgethr.c:556
    #8  0x00007f27f0a90ea5 in start_thread () from
    /lib64/libpthread.so.0
    #9  0x00007f27f018fb0d in clone () from /lib64/libc.so.6
    

Problem conclusion

  • This problem is fixed in 5.1.2.12 
    To see all Spectrum Scale APARs and their respective
    Fix solutions refer to page:
    https://public.dhe.ibm.com/storage/spectrumscale/spectrum_scale_
    apars.html
    
    Benefits of the solution:
    The code has been modified to address the crashes.
    
    Workaround:
    None
    
    Problem Trigger:
    For Issue 1, the crash is related to the NFSv4 lease period and
    can occur due to timing issues, such as delays in lease renewal
    or a heavily loaded server with multiple client requests.
    
    For Issue 2, the crash is related to blocking lock requests and
    lock upgrades on the same fileby multiple threads, which can
    lead to timing issues.
    
    Platforms Affected:
    Linux Only
    
    Functional Area Affected:
    NFS-Ganesha crash followed by CES-IP failover.
    
    Customer Impact:
    Medium Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ47409

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    512

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2023-06-29

  • Closed date

    2023-07-19

  • Last modified date

    2023-07-19

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"512","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
20 July 2023