Customizing failover behavior for unresponsive NFS
The NFS monitor raises a nfs_not_active error event if it detects a
potential NFS hung situation. This triggers a failover, which impacts the system's overall
performance. However, the NFS might not be hung in such a situation. It is possible that the NFS is
fully working even if it does not react on the monitor checks at that specific time. In these cases,
it would be better to trigger a warning instead of a failover. The NFS monitor can be configured to
send an nfs_unresponsive warning event instead of the
nfs_not_active event if it detects a potential hung situation.
The nfs_unresponsive event configuration can be done in the monitor
configuration file /var/mmfs/mmsysmon/mmsysmonitor.conf. In the
mmsysmonitor.conf file, the nfs section contains the new flag
failoverunresponsivenfs
Setting the failoverunresponsivenfs flag to false triggers the
WARNING event, nfs_unresponsive, if the NFS does not respond to
NULL requests or has no measurable NFS operation activity. Setting the warning event instead of an
error event ensures that the NFS service is not interrupted. This allows the system to avoid an
unnecessary failover in case the monitor cycles detect a healthy state again for the NFS later.
However, if the NFS is hung, there is no automatic recovery even if the NFS remains hung for a long
time. It is the user's responsibility to check the system and to restart NFS manually, if
needed.
Setting the failoverunresponsivenfs flag to true triggers the
ERROR event, nfs_not_active, if the NFS does not respond to NULL
requests or has no measurable NFS operation activity. Setting the error event instead of a warning
event ensures that a failover is triggered when the system detects a potential NFS hung situation.
However, if the flag is set to true, a failover might be triggered even if the NFS server is not
hung, but just overloaded.