IBM Support

IJ18591: [NFS] GANESHA_STATS FOR SH CHECK SHOWS NO IMPROVEMENT AND FREQU

Subscribe to this APAR

By subscribing, you receive periodic emails alerting you to the status of the APAR, along with a link to the fix after it becomes available. You can track this item individually or track all items by product.

Notify me when this APAR changes.

Notify me when an APAR for this component changes.

 

APAR status

  • Closed as program error.

Error description

  • The NFS monitor checks the health state of a running NFS
    instance periodically.
    Sometimes the NFS service does not react on some "alive" check
    commands, and that is interpreted as a potential "hung" state.
    Based on the configuration in the mmsysmonitor.conf file either
    a failover or just a warning is triggered then.
    

Local fix

  • The behavior or a detected potential "hung" state can be
    customized with the flag  'failoverunresponsivenfs' in the
    mmsysmonitor.conf file,  section [nfs].
    

Problem summary

  • Problem description:
    The NFS monitor checks the health state of a running NFS
    instance periodically.
    Sometimes the NFS service does not react on some "alive" check
    commands, and that is interpreted as a potential "hung" state.
    Based on the configuration in the mmsysmonitor.conf file either
    a failover or just a warning is triggered then.
    

Problem conclusion

  • Benefits of the solution:
    The fix increases the time span between internal checks up to a
    minute until a decision about a detected "hung" state is made.
    This is much more reliable than the previous approach with
    around 10-20 seconds.
    
    Work Around:
    The behavior or a detected potential "hung" state can be
    customized with the flag  'failoverunresponsivenfs' in the
    mmsysmonitor.conf file,  section [nfs].
    
    The meaning of the flag value is:
    "true" = set an ERROR event (nfs_not_active) if NFS does not
    respond to NULL requests and has no measurable NFS operation
    activity
    
    "false" = set an DEGRADED event (nfs_unresponsive) if NFS does
    not respond to NULL requests and has no measurable NFS operation
    activity
    
    The monitor needs to be restarted after a change
    (mmsysmoncontrol restart).
    The change must be done on all nodes in the same way.
    
    Problem trigger:
    In some cases high I/O load lead to the situation that NFS v3
    and/or v4 NULL requests failed, and that a following internal
    statistics check reported no activity in respect to the number
    of internal NFS operations. These checks are done within a
    timespan of several seconds to a minute. In fact, the system
    might be still functional, and the internally detected
    "unresponsive" state might be just temporarily so that a
    failover would not be advised in this case.
    The monitor interprets the "unresponsiveness"  as a potential
    "hung" state, and triggers either a failover or a warning,
    dependent on the configuration settings.
    
    Symptom: Performance
    Impact/Degradation
    
    Platforms affected:
    ALL Linux OS environments (CES nodes)
    
    Functional Area affected:
    Systemhealth
    
    Customer Impact:
    High Importance
    

Temporary fix

Comments

APAR Information

  • APAR number

    IJ18591

  • Reported component name

    SPEC SCALE STD

  • Reported component ID

    5737F33AP

  • Reported release

    503

  • Status

    CLOSED PER

  • PE

    NoPE

  • HIPER

    NoHIPER

  • Special Attention

    NoSpecatt / Xsystem

  • Submitted date

    2019-08-28

  • Closed date

    2019-08-28

  • Last modified date

    2019-08-28

  • APAR is sysrouted FROM one or more of the following:

  • APAR is sysrouted TO one or more of the following:

    IJ18744

Fix information

  • Fixed component name

    SPEC SCALE STD

  • Fixed component ID

    5737F33AP

Applicable component levels

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"STXKQY","label":"IBM Spectrum Scale"},"Component":"","ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"503","Edition":"","Line of Business":{"code":"LOB26","label":"Storage"}}]

Document Information

Modified date:
28 August 2019