NFS error events

This topic provides information on how to verify and resolve NFS errors.

Following is a list of possible events that might cause a node to go into a failed state and possible solutions for each of the issues. To determine what state a component is in, run the mmces events active nfs command.

NFS is hung (nfs_hung)

Cause

Statistics query indicates that CES NFS is not responding.

Determination

Call the CES NFS statistics command with some delay and compare the NFS server timestamp, then determine if the NFS operation counts are increasing. Run this command:
/usr/bin/ganesha_stats ; sleep 5 ; /usr/bin/ganesha_stats
Timestamp: Wed Apr 27 19:27:22 201634711407 nsecs
Total NFSv3 ops: 0
Total NFSv4.0 ops: 86449
Total NFSv4.1 ops: 0
Total NFSv4.2 ops: 0
Timestamp: Wed Apr 27 19:27:27 201687146242 nsecs
Total NFSv3 ops: 0
Total NFSv4.0 ops: 105271
Total NFSv4.1 ops: 0
Total NFSv4.2 ops: 0

Solution

The nfs_hung event restarts NFS. Check the service state for any restart-related errors.

CES NFSD process not running (nfsd_down)

Cause

CES NFS server protocol is no longer running.

Determination
  1. Check to see whether the CES NFS daemon is running:
    ps -C gpfs.ganesha.nfsd
  2. Check whether d-bus is alive. Run:
    /usr/bin/ganesha_stats
If either CES NFS or d-bus is down, you will receive an error:
ERROR: Can't talk to ganesha service on d-bus. Looks like Ganesh is down.

Solution

Restart CES NFS on the local CES node by using commands mmces service stop nfs and mmces service start nfs.

RPC statd process is not running (statd_down)

This applies only if NFS version 3 is enabled in the CES NFS configuration.

Cause

The rpc.statd process is no longer running.

Determination

Check rpc.statd by running:
ps -C rpc.statd

Solution

Restart CES NFS on the local CES node by using commands mmces service stop nfs and mmces service start nfs.

Portmapper port 111 is not active (portmapper_down)

Cause

RPC call to port 111 failed or timed out.

Determination

Check portmapper output by running:
rpcinfo -n 111 -t localhost portmap
rpcinfo -t localhost nfs 3
rpcinfo -t localhost nfs 4

Solution

Check to see whether portmapper is running and if portmapper (rpcbind) is configured to automatically start on system startup.

NFS client cannot mount NFS exports from all protocol nodes

Cause

The NFS client can mount NFS exports from some but not all protocol nodes, because the exports are not seen when doing a showmount against those protocol nodes where this problem surfaces.

Determination

The error itself occurs on the NFS server side and is related to a Red Hat® problem with netgroup caching, which makes caching unreliable.

Solution

Disable caching netgroups in nscd for AD values. For more information about how to disable nscd caching, see the nsd.conf man page in https://linux.die.net/man/5/nscd.conf.

The rpc.statd service fails with a "Permission denied" error

Cause

The rpc.statd service fails with a "Permission denied" error when you attempt to create a directory under the CNFS shared directory. The rpc.statd service runs as an rpcuser user that does not have the read permission to the CNFS shared directory. Therefore, the user cannot create a file in a subdirectory under the CNFS shared directory.

Determination

Check messages similar to the following, in the syslog file on server nodes where CNFS or KNFS is enabled.
rpc.statd[<pid>]: Failed to insert: creating /var/lib/nfs/statd/sm/<client node>:Permission denied
rpc.statd[<pid>]: STAT_FAIL to <server node> for SM_MON of <client node>
kernel:lockd: cannot monitor <client node>

Solution

On the CNFS shared directory, change the permission to 755 so that the shared directory is readable to all users.
chmod 755 <CNFS shared directory>
systemctl restart nfs

You might need to reboot a node, if this problem persists after the NFS service restart.

For more information about NFS events, see Events.