Customizing failover behavior for unresponsive NFS
The NFS monitor raises a nfs_not_active
error event if it detects a
potential NFS hung situation. This triggers a failover, which impacts the system's overall
performance. However, the NFS might not be hung in such a situation. It is possible that the NFS is
fully working even if it does not react on the monitor checks at that specific time. In these cases,
it would be better to trigger a warning instead of a failover. The NFS monitor can be configured to
send an nfs_unresponsive
warning event instead of the
nfs_not_active
event if it detects a potential hung situation.
The nfs_unresponsive
event configuration can be done in the monitor
configuration file /var/mmfs/mmsysmon/mmsysmonitor.conf. In the
mmsysmonitor.conf
file, the nfs
section contains the new flag
failoverunresponsivenfs
Setting the failoverunresponsivenfs
flag to false
triggers the
WARNING
event, nfs_unresponsive
, if the NFS does not respond to
NULL requests or has no measurable NFS operation activity. Setting the warning event instead of an
error event ensures that the NFS service is not interrupted. This allows the system to avoid an
unnecessary failover in case the monitor cycles detect a healthy state again for the NFS later.
However, if the NFS is hung, there is no automatic recovery even if the NFS remains hung for a long
time. It is the user's responsibility to check the system and to restart NFS manually, if
needed.
Setting the failoverunresponsivenfs
flag to true
triggers the
ERROR
event, nfs_not_active
, if the NFS does not respond to NULL
requests or has no measurable NFS operation activity. Setting the error event instead of a warning
event ensures that a failover is triggered when the system detects a potential NFS hung situation.
However, if the flag is set to true, a failover might be triggered even if the NFS server is not
hung, but just overloaded.