The backend protocol - NFS versus NSD

The NSD protocol is a stateful protocol. The NFSv3 protocol is a stateless protocol which is very resilient to low bandwidth and lossy networks.

The current recommended transport protocol for AFM data transfers is NFS, due to the tolerance of NFS to unstable network connections. It is recommended that you first try using NFS, and shift to the NSD protocol only if NFS does not meet the desired performance requirements even with multiple primary gateways and use of parallel data transfers. The implications of using the NSD protocol between the cache and home cluster are:

  1. Network availability fluctuations and instability can affect the NSD protocol connection to the home on the cache cluster primary gateways. This can lead to frequent data access interruptions from the home cluster, and can even cause the connection to the home cluster to stop responding. In these cases, it might be necessary to restart the GPFS™ daemon on the primary gateway, and possibly even restart the primary gateway server.
  2. IBM Spectrum Scale™ instability issues on the home cluster can affect the cache cluster. The instability issues can also cause the AFM fileset in the cache cluster to stop responding, and might also require a restart of the IBM Spectrum Scale service on both the home and cache clusters.

For more information on setting up primary gateway nodes that communicate with multiple NFS servers at home, see Parallel data transfers.

The following table summarizes the differences between NSD and NFS protocols on various parameters –
Table 1. Comparison between NSD and NFS protocols
  NSD NFS
Ease of use Customers are familiar with its use in multi-cluster environments. Configuration does not require NFS knowledge or tuning, but requires NSD tuning. Configuration requires NFS knowledge and performance tuning for both NFS and TCP over WAN.
Performance By default, uses all primary gateway nodes for parallel data transfers. Large file data transfer performance is better than NFS from a single primary gateway node as it can use the inherent parallelism of striping to multiple NSDs. Parallel data transfers can be achieved by creating mapping between primary gateway nodes and NFS servers at home. In summary, while both NFS and NSD can do similar forms of parallelism, generally NSD achieves higher performance.
Security Encryption is built in, which can be turned on optionally. -
Firewall Special ports not required. Must be configured for the traffic to pass through.
Stability Performs well if the network is stable and has low latency. More resilient to failures within the network such as packet drops which readily happen over WAN and it is more resilient, protecting the cache cluster from being affected by home cluster issues.

Considerations when using the NSD protocol for AFM data transfers

The NSD protocol is more sensitive to packet drops and network latency than NFS. If the network does not respond, or if the packets are dropped, the NSD mount on cache cluster stops responding, causing the cache cluster also to stop responding. More causes of issues when using the NSD protocol are:
  1. Deadlock in the home cluster - This can cause the NSD mounts on the cache cluster to stop responding, and can result in the entire cache cluster not responding.
  2. Cluster reconfiguration of the home cluster - This causes a temporary 'does not respond' of the cache cluster. For example, if the home cluster takes 1 minute to reconfigure, AFM operations such as readir are in a 'does not respond' mode for 1 minute on the cache cluster. The recovery is automatic after the cluster reconfiguration is complete.
  3. An increased resource consumption at primary gateway node such as mailboxes, more threads puts more resource pressure on the primary gateway node.
  4. When a new primary gateway node joins the cluster, the old primary gateway node transfers the fileset to new primary gateway node. If the remote filesystem is not mounted on the new primary gateway node, the fileset remains in an 'unmounted' state. After the remote file system is mounted at gateway node, the fileset automatically moves to Active state.
  5. Remote File System cannot be unmounted unless replication is stopped, or primary gateway node is restarted. AFM puts a hold on remote mount, not allowing the file system to be unmounted.
  6. Creating an AFM association, using GPFS protocol, to the same local file system is not supported.