Network Shared Disk (NSD)

If all servers in a cluster cannot be attached directly to the SAN, Spectrum Scale provides a protocol that implements a block-level interface over the network, called an NSD.

For larger Spectrum Scale clusters, the complexity of SAN configuration can become increasingly difficult. In addition, the costs and management of the SANs increase as the cluster size increases.

Such setups can benefit from NSD, with its network block I/O feature. All cluster members must be able to communicate via a network to each other. This also enables the possibility to add far distant cluster members.

In an NSD configuration there are typically only a few servers attached to the SAN with user data. These servers are called NSD servers and provide access to the user data for the servers that are not connected to the SAN. These are called NSD clients and must have LAN access in order to do network block I/O to the NSD servers. For the NSD clients user data and Spectrum Scale control information flows over the TCP/IP network first. Physically reading or writing user data to the SAN disks is done on behalf of the NSD servers that trigger the SAN disk operations.

For the SUT: When the SUT (shown in Figure 1) was set up for NSD, the four ECM nodes played the role of NSD clients and had no access to the SAN disks. Three additional new nodes were added to the SUT as NSD servers that had SAN access.
Figure 1. ECM cluster with Spectrum Scale NSD cluster configuration
This graphic provides an overview of an ECM cluster with Spectrum Scale NSD cluster configuration
Note: The network that is used to transfer data and control information in case of NSD does not need to be dedicated to Spectrum Scale. However, the network bandwidth must be sufficient to meet the goals of Spectrum Scale and any other application sharing the same network.

The following Spectrum Scale list cluster command output shows the layout of the NSD cluster for the SUT with the node designations for the ECM nodes (cluster members). At the first sight one cannot directly see that this is an NSD cluster configuration apart from the node naming scheme.

mmlscluster command output for the SUT Spectrum Scale NSD cluster NSD clients (ECM nodes) and NSD servers:
# mmlscluster

GPFS cluster information
========================
  GPFS cluster name:         ECM_4_node
  GPFS cluster id:           714383681xxxxxxxxx
  GPFS UID domain:           ECM_4node
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address     Admin node name  Designation
---------------------------------------------------------------------
   1   ECMnod1           10.xxx.xx.xx  ECMnod1          
   2   ECMnod2           10.xxx.xx.xx  ECMnod2         
   3   ECMnod3           10.xxx.xx.xx  ECMnod3         
   4   ECMnod4           10.xxx.xx.xx  ECMnod4         
   5   NSDsrv1           10.xxx.xx.xx  NSDsrv1          quorum-manager
   6   NSDsrv2           10.xxx.xx.xx  NSDsrv2          quorum
   7   NSDsrv3           10.xxx.xx.xx  NSDsrv3          quorum
The Spectrum Scale cluster had now seven nodes in total (ECMnod[1-4] and NSDsrv[1-3]).
  • ECMnod[1-4] were the NSD clients and the newly added NSDsrv[1-3] virtual machines were the NSD servers.
  • NSDsrv1 had the node designation quorum-manager and acted as the filesystem manager for the cluster and was also in the node pool from which quorum was derived.
  • NSDsrv2 and NSDsrv3 complemented the node quorum and were both in the pool of quorum nodes.
  • ECMnod[1-4] had the node designation nonquorum-client which was listed as an empty string in the designation column for the mmlscluster command.
Therefore, ECMnod[1-4] had a client role and NSDnod[1-3] server roles.

The mmlsnsd command displays Network Shared Disk information for a Spectrum Scale cluster. The NSD servers column lists the NSD servers which provide SAN access.

The following example shows the FileNet File Storage Area NSD disks with their NSD servers (an unbalanced server list):
# mmlsnsd

 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 FSData        nsd14cb      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd15cb      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd14cc      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd15cc      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd14cd      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd15cd      NSDsrv1,NSDsrv2,NSDsrv3 

In the above example, the NSD disks for the filesystem FSData (used as the FileNet File Storage Area) were SAN-attached to the NSD servers (NSDsrv[1-3]).

However, the NSD setup shown above has a certain disadvantage. All six NSD disks had NSDsrv1 as the first NSD server in the list. NSDsrv[2,3] were only used when the precursor server in the list was not available. So the example above showed a failover configuration with an unbalanced NSD server list.

Exploring some tuning options

Another way to define the NSD disks is to mix the order of the NSD servers in the list across the available NSD servers. In doing so, this allows a simple but effective I/O striping across the NSD servers in the case when all servers are available at the same time (see the output from mmlsnsd in the example that follows). This setup is called a balanced NSD server list.

This example shows NSD disks for the File Storage Area network that are attached to the ECM nodes (a balanced NSD server list):
# mmlsnsd

File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 FSData        nsd14cb      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd15cb      NSDsrv3,NSDsrv1,NSDsrv2 
 FSData        nsd14cc      NSDsrv2,NSDsrv3,NSDsrv1 
 FSData        nsd15cc      NSDsrv1,NSDsrv2,NSDsrv3 
 FSData        nsd14cd      NSDsrv3,NSDsrv1,NSDsrv2 
 FSData        nsd15cd      NSDsrv2,NSDsrv3,NSDsrv1
In this example, the order of the servers in the NSD servers list was different to that of the unbalanced NSD server list.
  • The six NSD disks had different first NSD servers in the list. As a result, each NSD server now served two NSD disks out of six, and disk I/O was done in parallel.
  • In the unbalanced example, a single NSD server did all the disk I/O to the six NSD disks.
The following example shows the creation of NSD disks with balanced NSD servers via mmcrnsd using a stanza file:
# cat /data/FNP8_GPFS/nsd-disks.txt
%nsd:
  device=/dev/mapper/scsi14cb
  servers=NSDsrv1,NSDsrv2,NSDsrv3
  nsd=nsd14cb
  usage=dataAndMetadata
%nsd:
  device=/dev/mapper/scsi15cb
  servers=NSDsrv3,NSDsrv1,NSDsrv2
  nsd=nsd15cb
  usage=dataAndMetadata
…

# mmcrnsd -F /data/FNP8_GPFS/nsd-disks.txt
mmcrnsd: Processing disk mapper/scsi14cb
mmcrnsd: Processing disk mapper/scsi15cb
…
Figure 2 shows the outbound I/O traffic (disk read and network transmitted) for the NSD servers during the execution of the ECM workload. For the unbalanced NSD servers, a single NSD server performed all of the disk and network I/O. In contrast, in the case of the balanced NSD servers the disk and network I/O was equally distributed across the three servers.
Figure 2. Network and disk I/O for NSD servers (unbalanced vs. balanced server list)
This graphic provides an overview of network and disk I/O for NSD servers and shows the unbalanced versus balanced server list)

In addition, Figure 2 shows that the disk and network I/O rates were almost equal on the NSD servers. This implies that a dedicated NSD server simply forwards both the data that is read from the SAN and the data that originates from the NSD clients. Therefore, this behavior was also true for the inbound I/O traffic (for the disk write and network received).

The mmlsmount command lists the cluster members that have mounted a particular filesystem. In this command example the filesystem FSData was mounted on all seven cluster members (NSD servers and NSD clients). However, the filesystem does not necessarily have to be mounted on the NSD servers.
# mmlsmount FSData -L
                                            
File system FSData is mounted on 7 nodes:
  10.xxx.xx.xxx   NSDsrv1                  
  10.xxx.xx.xxx   NSDsrv2                  
  10.xxx.xx.xxx   NSDsrv3                   
  10.xxx.xx.xxx   ECMnod1                   
  10.xxx.xx.xxx   ECMnod2                  
  10.xxx.xx.xxx   ECMnod3                  
  10.xxx.xx.xxx   ECMnod4

NSD configuration study of pagepools in an NSD cluster

Every node in a Spectrum Scale cluster has its own pagepool. The pagepool can vary in size for each node according to the intended role or task of the node.

The following short side study analyzes the pagepool efficiency in a NSD cluster. For the cluster configuration SD pagepooling is easier to understand, because the relation of directly attached SAN disks and pagepools on the same node is more obvious.

However in a NSD cluster there are two types of roles in a cluster, NSD servers and NSD clients each having a pagepool. The following pagepool measurement series was done to investigate the pagepool effects on the server and on the client nodes in a Spectrum Scale NSD cluster.

The NSD disk read rate on the NSD servers was used as a metric to qualify the efficiency of the pagepools.

Measurement series 1: The pagepools were scaled from 256 MiB to 2 GiB at the same time on all three NSD servers, whilst the pagepools on the four ECM nodes (NSD clients) were kept at a fixed size of 2 GiB on each ECM node.
Figure 3. Scaling pagepool sizes for Spectrum Scale NSD servers
This graphic provides an overview of how pagepool sizes were scaled for Spectrum Scale NSD servers)

Figure 3 shows the read disk I/O rates in total (summarized for all three NSD servers) for different pagepool sizes on the NSD servers. The disk I/O read rates on the NSD servers remained at a constant level independent of the pagepool size on the NSD servers.

Measurement series 2: The pagepools were scaled from 256 MiB to 2 GiB on the four ECM nodes (NSD clients), while the pagepools on the NSD servers were kept at a fixed size of 2 GiB.
Figure 4. Scaling pagepool sizes for Spectrum Scale NSD clients
This graphic provides an overview of how pagepool sizes were scaled for Spectrum Scale NSD clients)

Figure 4 shows the size scaling for the NSD client pagepools in a second measurement series. The NSD clients do network block I/O and no disk I/O for Spectrum Scale filesystem. Therefore, the metric to look at is still the disk I/O read rate for the NSD servers.

In contrast to the NSD server pagepool size scaling, there was now a dependency of the NSD client pagepool sizes and the NSD server disk I/O read rates:
  • The NSD server read rates started at 180 MiB/sec for 256 MiB and gradually decreased to 162 MiB for a 2 GiB pagepool.

Conclusion

Spectrum Scale cache usage is most effective on the nodes where the data processing occur. Typically, this is on the NSD clients when the NSD servers are used for I/O processing only. In that case larger pagepools are less important for NSD servers. The reason for this behaviour is that pagepools on NSD servers are not used for any application data caching but for I/O buffering. The larger the NSD client pagepool, the less disk I/O was done by the NSD servers.

For the SUT: The pagepool size was set to 2 GiB on all four ECM nodes (NSD clients). This was the same pagepool size as for the ECM nodes in the SD cluster configuration. The pagepool size was set to 1 GiB (default value) for the NSD servers. These pagepool sizes were used for the scale-out study for the NSD cluster configuration.

The mmlsconfig command can be used to query the pagepool sizes in a Spectrum Scale cluster. Note that now two different sizes were reported for NSD servers and clients:
# mmlsconfig pagepool
pagepool 2G  
pagepool 1G [NSDsrv1,NSDsrv2,NSDsrv3]