Network Shared Disk (NSD)
If all servers in a cluster cannot be attached directly to the SAN, Spectrum Scale provides a protocol that implements a block-level interface over the network, called an NSD.
For larger Spectrum Scale clusters, the complexity of SAN configuration can become increasingly difficult. In addition, the costs and management of the SANs increase as the cluster size increases.
Such setups can benefit from NSD, with its network block I/O feature. All cluster members must be able to communicate via a network to each other. This also enables the possibility to add far distant cluster members.
In an NSD configuration there are typically only a few servers attached to the SAN with user data. These servers are called NSD servers and provide access to the user data for the servers that are not connected to the SAN. These are called NSD clients and must have LAN access in order to do network block I/O to the NSD servers. For the NSD clients user data and Spectrum Scale control information flows over the TCP/IP network first. Physically reading or writing user data to the SAN disks is done on behalf of the NSD servers that trigger the SAN disk operations.
The following Spectrum Scale list cluster command output shows the layout of the NSD cluster for the SUT with the node designations for the ECM nodes (cluster members). At the first sight one cannot directly see that this is an NSD cluster configuration apart from the node naming scheme.
# mmlscluster
GPFS cluster information
========================
GPFS cluster name: ECM_4_node
GPFS cluster id: 714383681xxxxxxxxx
GPFS UID domain: ECM_4node
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR
Node Daemon node name IP address Admin node name Designation
---------------------------------------------------------------------
1 ECMnod1 10.xxx.xx.xx ECMnod1
2 ECMnod2 10.xxx.xx.xx ECMnod2
3 ECMnod3 10.xxx.xx.xx ECMnod3
4 ECMnod4 10.xxx.xx.xx ECMnod4
5 NSDsrv1 10.xxx.xx.xx NSDsrv1 quorum-manager
6 NSDsrv2 10.xxx.xx.xx NSDsrv2 quorum
7 NSDsrv3 10.xxx.xx.xx NSDsrv3 quorum
ECMnod[1-4] and NSDsrv[1-3]). ECMnod[1-4] were the NSD clients and the newly addedNSDsrv[1-3] virtual machines were the NSD servers.NSDsrv1had the node designation quorum-manager and acted as the filesystem manager for the cluster and was also in the node pool from which quorum was derived.NSDsrv2andNSDsrv3complemented the node quorum and were both in the pool of quorum nodes.ECMnod[1-4] had the node designation nonquorum-client which was listed as an empty string in the designation column for the mmlscluster command.
ECMnod[1-4] had a client role and
NSDnod[1-3] server roles.The mmlsnsd command displays Network Shared Disk information for a Spectrum Scale cluster. The NSD servers column lists the NSD servers which provide SAN access.
unbalanced server list):
# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
FSData nsd14cb NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd15cb NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd14cc NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd15cc NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd14cd NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd15cd NSDsrv1,NSDsrv2,NSDsrv3
In the above example, the NSD disks for the filesystem FSData (used as the
FileNet File Storage Area) were SAN-attached to the NSD servers
(NSDsrv[1-3]).
However, the NSD setup shown above has a certain disadvantage. All six NSD disks had
NSDsrv1 as the first NSD server in the list.
NSDsrv[2,3] were only used when the precursor server in the list
was not available. So the example above showed a failover configuration with an unbalanced
NSD server list.
Exploring some tuning options
Another way to define the NSD disks is to mix the order of the NSD servers in the list across the available NSD servers. In doing so, this allows a simple but effective I/O striping across the NSD servers in the case when all servers are available at the same time (see the output from mmlsnsd in the example that follows). This setup is called a balanced NSD server list.
balanced NSD server list):
# mmlsnsd
File system Disk name NSD servers
---------------------------------------------------------------------------
FSData nsd14cb NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd15cb NSDsrv3,NSDsrv1,NSDsrv2
FSData nsd14cc NSDsrv2,NSDsrv3,NSDsrv1
FSData nsd15cc NSDsrv1,NSDsrv2,NSDsrv3
FSData nsd14cd NSDsrv3,NSDsrv1,NSDsrv2
FSData nsd15cd NSDsrv2,NSDsrv3,NSDsrv1
- The six NSD disks had different first NSD servers in the list. As a result, each NSD server now served two NSD disks out of six, and disk I/O was done in parallel.
- In the unbalanced example, a single NSD server did all the disk I/O to the six NSD disks.
# cat /data/FNP8_GPFS/nsd-disks.txt
%nsd:
device=/dev/mapper/scsi14cb
servers=NSDsrv1,NSDsrv2,NSDsrv3
nsd=nsd14cb
usage=dataAndMetadata
%nsd:
device=/dev/mapper/scsi15cb
servers=NSDsrv3,NSDsrv1,NSDsrv2
nsd=nsd15cb
usage=dataAndMetadata
…
# mmcrnsd -F /data/FNP8_GPFS/nsd-disks.txt
mmcrnsd: Processing disk mapper/scsi14cb
mmcrnsd: Processing disk mapper/scsi15cb
…
In addition, Figure 2 shows that the disk and network I/O rates were almost equal on the NSD servers. This implies that a dedicated NSD server simply forwards both the data that is read from the SAN and the data that originates from the NSD clients. Therefore, this behavior was also true for the inbound I/O traffic (for the disk write and network received).
# mmlsmount FSData -L
File system FSData is mounted on 7 nodes:
10.xxx.xx.xxx NSDsrv1
10.xxx.xx.xxx NSDsrv2
10.xxx.xx.xxx NSDsrv3
10.xxx.xx.xxx ECMnod1
10.xxx.xx.xxx ECMnod2
10.xxx.xx.xxx ECMnod3
10.xxx.xx.xxx ECMnod4
NSD configuration study of pagepools in an NSD cluster
Every node in a Spectrum Scale cluster has its own pagepool. The pagepool can vary in size for each node according to the intended role or task of the node.
The following short side study analyzes the pagepool efficiency in a NSD cluster. For the cluster configuration SD pagepooling is easier to understand, because the relation of directly attached SAN disks and pagepools on the same node is more obvious.
However in a NSD cluster there are two types of roles in a cluster, NSD servers and NSD clients each having a pagepool. The following pagepool measurement series was done to investigate the pagepool effects on the server and on the client nodes in a Spectrum Scale NSD cluster.
The NSD disk read rate on the NSD servers was used as a metric to qualify the efficiency of the pagepools.
Figure 3 shows the read disk I/O rates in total (summarized for all three NSD servers) for different pagepool sizes on the NSD servers. The disk I/O read rates on the NSD servers remained at a constant level independent of the pagepool size on the NSD servers.
Figure 4 shows the size scaling for the NSD client pagepools in a second measurement series. The NSD clients do network block I/O and no disk I/O for Spectrum Scale filesystem. Therefore, the metric to look at is still the disk I/O read rate for the NSD servers.
- The NSD server read rates started at 180 MiB/sec for 256 MiB and gradually decreased to 162 MiB for a 2 GiB pagepool.
Conclusion
Spectrum Scale cache usage is most effective on the nodes where the data processing occur. Typically, this is on the NSD clients when the NSD servers are used for I/O processing only. In that case larger pagepools are less important for NSD servers. The reason for this behaviour is that pagepools on NSD servers are not used for any application data caching but for I/O buffering. The larger the NSD client pagepool, the less disk I/O was done by the NSD servers.
For the SUT: The pagepool size was set to 2 GiB on all four ECM nodes (NSD clients). This was the same pagepool size as for the ECM nodes in the SD cluster configuration. The pagepool size was set to 1 GiB (default value) for the NSD servers. These pagepool sizes were used for the scale-out study for the NSD cluster configuration.
# mmlsconfig pagepool
pagepool 2G
pagepool 1G [NSDsrv1,NSDsrv2,NSDsrv3]