Parallel data transfers

Parallel data transfer improves the AFM data transfer performance.

To help the primary gateway exchange large files with the home cluster, a cache cluster can be configured to leverage all the gateways defined in the cluster. When using NFS for AFM data transfers multiple NFS servers are required at the home cluster. All NFS servers on the home cluster must export the home path using the same parameters.

In a cache cluster, using NFS for AFM data transfer, each gateway node can be mapped to a specific NFS server at home. A map replaces the NFS server name in the AFMTarget parameter. Creating an export server map can be used to define more than one NFS server and map those NFS servers to specific AFM gateways. A map can be changed without modifying the afmTarget parameter for a fileset, and needs fileset relink or file system remount for the map change to take effect. Use the mmafmconfig command to define, display, delete, and update mappings.

To define multiple NFS servers for an AFMTarget parameter and use parallel data transfers:

Define a mapping.
Use the mapping as the AFMTarget parameter for one or more filesets.
Update parallel read and write thresholds, in chunk size, as required.

The following example shows a mapping for NFS target, assuming four cache gateway nodes hs22n18, hs22n19, hs22n20, and hs22n21, mapped to two home NFS servers js22n01 and js22n02 (192.168.200.11 and 192.168.200.12) and then creating SW filesets by using this mapping.

Define the mapping:

# mmafmconfig add mapping1 --export-map
js22n01/hs22n18,js22n02/hs22n19


mmafmconfig: Command successfully completed
mmafmconfig: Propagating the cluster configuration data to all  affected nodes. This is an asynchronous process.

The syntax followed here is -

mmafmconfig  {add | update} MapName --export-map ExportServerMap

# mmafmconfig add mapping2 --export-map
js22n02/hs22n20,js22n01/hs22n21


mmafmconfig: Command successfully completed
mmafmconfig: Propagating the cluster configuration data to all  affected nodes. This is an asynchronous process.

# mmafmconfig
show


Map name:             mapping1
Export server map:    192.168.200.12/hs22n19.gpfs.net,192.168.200.11/hs22n18.gpfs.net 

Map name:             mapping2
Export server map:    192.168.200.11/hs22n20.gpfs.net,192.168.200.12/hs22n21.gpfs.net

#Create filesets by using these
mappings:


mmcrfileset gpfs1 sw1 --inode-space new –p afmMode=sw,afmTarget=nfs://mapping1/gpfs/gpfs2/swhome
mmcrfileset gpfs1 ro1 --inode-space new –p afmMode=ro,afmTarget=nfs://mapping2/gpfs/gpfs2/swhome

The syntax followed here is -


mmcrfileset <FS> <fset_name> –p afmMode=<AFM Mode>,
          afmTarget=<protocol>://<Mapping>/<remoteFS_Path>/<Target> --inode-space new

All gateway nodes other than the primary gateway that is defined in a mapping are called participating gateway nodes. The primary gateway of a cache fileset communicates with each of the participating gateway nodes, depending on their availability. When parallel data transfer is configured, a single data transfer request is split into multiple chunks. Chunks are sent across to the participating gateway nodes in parallel for transfer to, or from home by using the respective NFS servers. Primary gateway processes the replies from all the participating gateway nodes, handles all data transfer failures, and coordinates activities until all data transfer is completed. If any participating gateway node fails, the primary gateway attempts to retry the failed task on the next available gateway and generates an error message in the IBM Spectrum Scale log.

Note:

The parallel data transfer does not work in failover cases. This parallel data transfer works when the fileset state is moved to the dirty state. End of change

Parallel reads and writes are effective on files with sizes larger than those specified by the parallel threshold. The threshold is defined by using afmParallelWriteThreshold and afmParallelReadThreshold parameters, and is true for all types of files except reads on sparse files and files with partial file caching enabled, which is served only by the Primary gateway without splitting.

Use the afmParallelWriteChunkSize and afmParallelReadChunkSize parameters to configure the size of each chunk.

Some more functions are as under -

While using native NSD protocol; if a fileset is created without any mapping, all gateway nodes are used for parallel data transfer.
While using NFS protocol, if more than one gateway node is mapped to the same NFS server, only one performs a read task. However, a write task is split among all the gateway nodes.
One gateway node cannot be mapped to more than one NFS server.
Changes in the active mapping take effect after fileset re-link or file system remount.
If mapping is not specified or if mapping does not match, data cannot be transferred by using parallel data transfers and normal data transfer function is used.
Gateway designation can be removed from a node only if that node is not defined in any mapping.

Note: If an AFM home is a mix of architectures (x86 and ppc), parallel data transfer works only for the set of nodes that belong to any one architecture, depending on which architecture serves the data transfer first.

Start of change This feature can be combined with the Parallel data transfer using multiple remote mounts feature to obtain better data transfer performance within an AFM cache and an AFM home. Both features use the same AFM gateway node mapping that is defined by using the mmafmconfig command. These features are independent of each other and you can set these features by considering what suits better for a workload. End of change