The backend protocol - NFS versus NSD

The NSD protocol is a stateful protocol. The NFSv3 protocol is a stateless protocol which is very resilient to low bandwidth and lossy networks.

The current recommended transport protocol for AFM data transfers is NFS, due to the tolerance of NFS to unstable network connections. It is recommended that you first try using NFS, and shift to the NSD protocol only if NFS does not meet the desired performance requirements even with multiple primary gateways and use of parallel data transfers. The implications of using the NSD protocol between the cache and home cluster are:

  1. Network availability fluctuations and instability can affect the NSD protocol connection to the home on the cache cluster primary gateways. This can lead to frequent data access interruptions from the home cluster, and can even cause the connection to the home cluster to stop responding. In these cases, it might be necessary to restart the GPFS™ daemon on the primary gateway, and possibly even restart the primary gateway server.
  2. IBM Spectrum Scale™ instability issues on the home cluster can affect the cache cluster. The instability issues can also cause the AFM fileset in the cache cluster to stop responding, and might also require a restart of the IBM Spectrum Scale service on both the home and cache clusters.

For more information on setting up primary gateway nodes that communicate with multiple NFS servers at home, see Parallel data transfers.

The following table summarizes the differences between NSD and NFS protocols on various parameters –
Table 1. Comparison between NSD and NFS protocols
  NSD NFS
Ease of use Customers are familiar with its use in multi-cluster environments. Configuration does not require NFS knowledge or tuning, but requires NSD tuning. Configuration requires NFS knowledge and performance tuning for both NFS and TCP over WAN.
Performance By default, uses all primary gateway nodes for parallel data transfers. Large file data transfer performance is better than NFS from a single primary gateway node as it can use the inherent parallelism of striping to multiple NSDs. Parallel data transfers can be achieved by creating mapping between primary gateway nodes and NFS servers at home. In summary, while both NFS and NSD can do similar forms of parallelism, generally NSD achieves higher performance.
Security Encryption is built in, which can be turned on optionally. Supports kerberos-enabled exported paths from home to cache. afmEnableNFSSec must be set to yes at cache.
Firewall Special ports might not be required. Must be configured for the traffic to pass through.
Stability Performs well if the network is stable and has low latency. More resilient to failures within the network such as packet drops which readily happen over WAN and it is more resilient, protecting the cache cluster from being affected by home cluster issues.

Considerations when using the NSD protocol for AFM data transfers

The NSD protocol is more sensitive to packet drops and network latency than NFS. If the network does not respond, or if the packets are dropped, the NSD mount on cache cluster stops responding, causing the cache cluster also to stop responding. More causes of issues when using the NSD protocol are:
  1. Deadlock in the home cluster - This might cause the NSD mounts on the cache cluster to stop responding for some time. Due to a non-responsive NSD mount, AFM fileset at cache using these NSD mounts as target might be in the 'unmounted' state. After the home cluster is responsive, the home cluster tries queued operations again.
  2. Cluster reconfiguration or higher resource consumption at the home cluster - This might cause a temporary loss of communication between home and cache cluster. If the home cluster does not respond within AFM wait timeout intervals, AFM filesets at cache using these NSD mounts as target might be in the 'unmounted' state. After the home cluster is responsive, the home cluster tries queued operations again.
  3. When a new primary gateway node joins the cluster, the old primary gateway node transfers the fileset to new primary gateway node. If the remote filesystem is not mounted on the new primary gateway node, the fileset remains in an 'unmounted' state. After the remote file system is mounted at gateway node, the fileset automatically moves to Active state.
  4. Remote File System cannot be unmounted unless replication is stopped, or primary gateway node is restarted. AFM puts a hold on remote mount, not allowing the file system to be unmounted.
  5. Creating an AFM association, using GPFS protocol, to the same local file system is not supported.
Note: If the NSD mount on the gateway node is unresponsive, AFM does not synchronize data with home or secondary. The filesystem might be unmounted at the gateway node. A message AFM: Remote filesystem remotefs is panicked due to unresponsive messages on fileset <fileset_name>,re-mount the filesystem after it becomes responsive. mmcommon preunmount invoked. File system: fs1 Reason: SGPanic is written to mmfs.log. After the home or secondary is responsive, you must restore the NSD mount on the gateway node.