The backend protocol - NFS versus NSD
The NSD protocol is a stateful protocol. The NFSv3 protocol is a stateless protocol, which is resilient to the low bandwidth and lossy networks.
The current recommended transport protocol for AFM data transfers is NFS due to the tolerance of NFS to unstable network connections. It is recommended to use NFS first and shift to the NSD protocol only if NFS does not meet the performance requirements even with multiple primary gateways and use of parallel data transfers. The implications of using the NSD protocol between the cache and home cluster are:
- Network availability fluctuations and instability can affect the NSD protocol connection to the home on the cache cluster primary gateways. This network issue can lead to frequent data access interruptions from the home cluster, and can even cause the connection to the home cluster to stop responding. In these cases, it might be necessary to restart the GPFS daemon on the primary gateway, and possibly even restart the primary gateway server.
- IBM Storage Scale instability issues on the home cluster can affect the cache cluster. The AFM fileset in the cache cluster does not respond because of the instability, and you might also need to restart the IBM Storage Scale service on the home and cache clusters.
For more information about setting up primary gateway nodes that communicate with multiple NFS servers at home, see Parallel data transfers.
NSD | NFS | |
---|---|---|
Usability | Customers are familiar with NSD use in multi-cluster environments. Configuration does not require NFS knowledge or tuning, but requires NSD tuning. | Configuration requires NFS knowledge and performance tuning for both NFS and TCP over WAN. |
Performance | By default, uses all primary gateway nodes for parallel data transfers. Large file data transfer performance is better than NFS from a single primary gateway node as it can use the inherent parallelism of striping to multiple NSDs. | Parallel data transfers can be achieved by creating mapping between primary gateway nodes and NFS servers on the home. In summary, while both NFS and NSD can do similar forms of parallelism, generally NSD achieves higher performance. |
Security | Encryption is built in, which can be turned on optionally. | Supports Kerberos-enabled exported paths from home to cache. The afmEnableNFSSec parameter must be set to yes on cache. |
Firewall configuration | Special ports might not be required. | Must be configured for the traffic to pass through. |
Stability | Performs well if the network is stable and has low latency. | More resilient to failures within the network such as packet drops that readily happen over WAN and it is more resilient, protecting the cache cluster from being affected by home cluster issues. |
Considerations when you use the NSD protocol for AFM data transfers
- Deadlock in the home cluster - This deadlock might cause the NSD mounts on the cache cluster to stop responding for some time. Due to a non-responsive NSD mount, AFM fileset at cache that uses these NSD mounts as target might be in the 'unmounted' state. After the home cluster is responsive, the home cluster tries queued operations again.
- Cluster reconfiguration or higher resource consumption on the home cluster - This configuration and resource consumption might cause a temporary loss of communication between home and cache cluster. If the home cluster does not respond within AFM wait timeout intervals, AFM filesets at cache that use these NSD mounts as target might be in the 'unmounted' state. After the home cluster is responsive, the home cluster tries queued operations again.
- When a new primary gateway node joins the cluster, the old primary gateway node transfers the fileset to a new primary gateway node. If the remote file system is not mounted on the new primary gateway node, the fileset remains in the 'unmounted' state. After the remote file system is mounted at gateway node, the fileset automatically moves to the Active state.
- Remote file system cannot be unmounted unless replication is stopped, or primary gateway node is restarted. AFM puts a hold on remote mount, not allowing the file system to be unmounted.
- Creating an AFM association, by using GPFS protocol, to the same local file system is not supported.
- If the NSD mount on the gateway node is unresponsive, AFM does not synchronize data with home or secondary. The file system might be unmounted on the gateway node. A message AFM: Remote filesystem remotefs is panicked due to unresponsive messages on fileset <fileset_name>, re-mount the filesystem after it becomes responsive. mmcommon preunmount invoked. File system: fs1 Reason: SGPanic is written to mmfs.log. After the home or secondary is responsive, you must restore the NSD mount on the gateway node.
- When NSD backend is used, afmDIO parameter is set to 0 by default. For NFS backend, afmDIO parameter value must be set to 2.