NFS error events
This topic provides information on how to verify and resolve NFS errors.
Following is a list of possible events that might cause a node to go into a failed state and possible solutions for each of the issues. To determine what state a component is in, run the mmces events active nfs command.
NFS is hung (nfs_hung)
Cause
Statistics query indicates that CES NFS is not responding.
Determination
/usr/bin/ganesha_stats ; sleep 5 ; /usr/bin/ganesha_stats
Timestamp: Wed Apr 27 19:27:22 201634711407 nsecs
Total NFSv3 ops: 0
Total NFSv4.0 ops: 86449
Total NFSv4.1 ops: 0
Total NFSv4.2 ops: 0
Timestamp: Wed Apr 27 19:27:27 201687146242 nsecs
Total NFSv3 ops: 0
Total NFSv4.0 ops: 105271
Total NFSv4.1 ops: 0
Total NFSv4.2 ops: 0
Solution
The nfs_hung
event restarts NFS. Check the service
state for any restart-related errors.
CES NFSD process not running (nfsd_down)
Cause
CES NFS server protocol is no longer running.
- Check to see whether the CES NFS daemon is
running:
ps -C gpfs.ganesha.nfsd
- Check whether d-bus is alive. Run:
/usr/bin/ganesha_stats
ERROR: Can't talk to ganesha service on d-bus. Looks like Ganesh is down.
Solution
Restart CES NFS on the local CES node by using commands mmces service stop nfs and mmces service start nfs.
RPC statd process is not running (statd_down)
This applies only if NFS version 3 is enabled in the CES NFS configuration.
Cause
The rpc.statd process is no longer running.
Determination
ps -C rpc.statd
Solution
Restart CES NFS on the local CES node by using commands mmces service stop nfs and mmces service start nfs.
Portmapper port 111 is not active (portmapper_down)
Cause
RPC call to port 111 failed or timed out.
Determination
rpcinfo -n 111 -t localhost portmap
rpcinfo -t localhost nfs 3
rpcinfo -t localhost nfs 4
Solution
Check to see whether portmapper is running and if portmapper (rpcbind) is configured to automatically start on system startup.
NFS client cannot mount NFS exports from all protocol nodes
Cause
The NFS client can mount NFS exports from some but not all protocol nodes, because the exports are not seen when doing a showmount against those protocol nodes where this problem surfaces.
Determination
The error itself occurs on the NFS server side and is related to a Red Hat® problem with netgroup caching, which makes caching unreliable.
Solution
Disable caching netgroups in nscd for AD values. For more information about how to disable nscd caching, see the nsd.conf man page in https://linux.die.net/man/5/nscd.conf.
The rpc.statd service fails with a "Permission denied" error
Cause
The rpc.statd
service fails with a "Permission denied" error when you attempt to
create a directory under the CNFS shared directory. The rpc.statd
service runs as
an rpcuser
user that does not have the read permission to the CNFS shared
directory. Therefore, the user cannot create a file in a subdirectory under the CNFS shared
directory.
Determination
rpc.statd[<pid>]: Failed to insert: creating /var/lib/nfs/statd/sm/<client node>:Permission denied
rpc.statd[<pid>]: STAT_FAIL to <server node> for SM_MON of <client node>
kernel:lockd: cannot monitor <client node>
Solution
chmod 755 <CNFS shared directory>
systemctl restart nfs
You might need to reboot a node, if this problem persists after the NFS service restart.
For more information about NFS events, see Events.