Additional checks on file system availability for CES exported data
A CES cluster exports file systems to its clients by using NFS or SMB. These exports
might be fully or partially located on the CES cluster directly, or might be remote-mounted from
other storage systems. If such a mount is not available at the time when the NFS or SMB services
starts up or at run time, the system throws an error. There are events that set the NFS or SMB state
to a DEGRADED
or FAILED
state in case all the necessary file
system are not available.
The NFS and SMB monitoring checks that the file systems required by the declared exports are all
available. If one or more of these file systems is unavailable, then they are marked as FAILED in
the mmhealth node show filesystem -v command output. The corresponding components
of the NFS or SMB are set into a DEGRADED
state. For NFS, the
nfs_exports_down
event is created initially. For SMB, the
smb_exports_down
event is created initially.
FAILED
state instead
of a DEGRADED
state if the required remote or local file systems are not mounted.
The change in state can be done only by the Cluster State Manager (CSM). If the CSM detects that
some of the CES nodes are in a DEGRADED
state, then it can overrule the
DEGRADED
state with a FAILED
state to trigger a failover of the
CES-IPs to healthy node.nfs_exports_down
and smb_exports_down
. Other events that cause a
DEGRADED
state are not handled by this procedure.For NFS, the nfs_exports_down
warning event is countered by a
nfs_exported_fs_down
error event from the CSM to mark it as
FAILED
. Similarly, for SMB, the smb_exports_down
warning event is
countered by a smb_exported_fs_down
error event to mark it as
FAILED
.
After the CSM detects that all the CES nodes report a nfs_exports_down
or
smb_exports_down
status, it clears the nfs_exported_fs_down
or
smb_exported_fs_down
events to allow each node to rediscover its own state again.
This prevents a cluster outage if only one protocol is affected, but others are active. However,
such a state might not be stable for a while and must be fixed as soon as possible. If the file
systems are mounted back again, then the SMB or NFS service monitors detect this and are able to
refresh their health state information.
- Make a backup copy of the current /var/mmfs/mmsysmon/mmsysmonitor.conf file.
- Open the file with a text editor, and search for the
[clusterstate]
section to set the value ofcsmsetmissingexportsfailed
to true or false:[clusterstate] ... # true = allow CSM to override NFS/SMB missing export events on the CES nodes (set to FAILED) # false = CSM does not override NFS/SMB missing export events on the CES nodes csmsetmissingexportsfailed = true
- Close the editor and restart the system health monitor using the following
command:
mmsysmoncontrol restart
- Run this procedure on all the nodes or copy the modified files to all nodes and restart the system health monitor on all nodes.
STOPPED
state even if all relevant file systems are
available at a later point in time. - Make a backup copy of the current mmsysmonitor.conf file.
- Open the file with a text editor, and search for the
nfs
section to set the value ofpreventnfsstartuponmissingfs
to true or false:# NFS settings # [nfs] ... # prevent NFS startup after reboot/mmstartup if not all required filesystems for exports are available # true = prevent startup / false = allow startup preventnfsstartuponmissingfs = true
- Close the editor and restart the system health monitor using the following
command:
mmsysmoncontrol restart
- Run this procedure on all the nodes or copy the modified files to all nodes and restart the system health monitor on all nodes.