Additional checks on file system availability for CES exported data
A CES cluster exports file systems to its clients by using NFS, S3,
or SMB. These exports might be fully or partially located on the CES cluster
directly, or might be remote-mounted from other storage systems. If such a mount is not available at
the time when the NFS
, S3,
or SMB services starts up or at run
time, the system throws an error. There are events that set the NFS
,
S3,
or SMB state to a
DEGRADED
or FAILED
state in case all
the necessary file system are not available.
The NFS, S3,
and SMB monitoring checks that the file systems
required by the declared exports are all available. If one or more of these file systems is
unavailable, then they are marked as FAILED in the mmhealth node show filesystem
-v command output. The corresponding components of the NFS
,
S3,
or SMB are set into a
DEGRADED
state. For NFS, the
nfs_exports_down
event is created initially. For SMB, the
smb_exports_down
event is created initially.
FAILED
state instead
of a DEGRADED
state if the required remote or local file systems are not mounted.
The change in state can be done only by the Cluster State Manager (CSM). If the CSM detects that
some of the CES nodes are in a DEGRADED
state, then it can overrule the
DEGRADED
state with a FAILED
state to trigger a failover of the
CES-IPs to healthy node.nfs_exports_down
and smb_exports_down
. Other events that cause a
DEGRADED
state are not handled by this procedure.For NFS, the nfs_exports_down
warning event is countered by a
nfs_exported_fs_down
error event from the CSM to mark it as
FAILED
. Similarly, for SMB, the smb_exports_down
warning event is
countered by a smb_exported_fs_down
error event to mark it as
FAILED
.
After the CSM detects that all the CES nodes report a nfs_exports_down
or
smb_exports_down
status, it clears the nfs_exported_fs_down
or
smb_exported_fs_down
events to allow each node to rediscover its own state again.
This prevents a cluster outage if only one protocol is affected, but others are active. However,
such a state might not be stable for a while and must be fixed as soon as possible. If the file
systems are mounted back again, then the SMB, S3,
or NFS service
monitors detect this and are able to refresh their health state information.
- Make a backup copy of the current /var/mmfs/mmsysmon/mmsysmonitor.conf file.
- Open the file with a text editor, and search for the
[clusterstate]
section to set the value ofcsmsetmissingexportsfailed
to true or false:[clusterstate] ... # true = allow CSM to override NFS/SMB missing export events on the CES nodes (set to FAILED) # false = CSM does not override NFS/SMB missing export events on the CES nodes csmsetmissingexportsfailed = true
- Close the editor and restart the system health monitor using the following
command:
mmsysmoncontrol restart
- Run this procedure on all the nodes or copy the modified files to all nodes and restart the system health monitor on all nodes.


STOPPED
state even
if all relevant file systems are available at a later point in time. - Make a backup copy of the current mmsysmonitor.conf file.
- Open the file with a text editor, and search for the
nfs
section to set the value ofpreventnfsstartuponmissingfs
to true or false:# NFS settings # [nfs] ... # prevent NFS startup after reboot/mmstartup if not all required filesystems for exports are available # true = prevent startup / false = allow startup preventnfsstartuponmissingfs = true
- Close the editor and restart the system health monitor using the following
command:
mmsysmoncontrol restart
- Run this procedure on all the nodes or copy the modified files to all nodes and restart the system health monitor on all nodes.