AFM to cloud object storage parallel read data transfer
Parallel read data transfer improves the overall data read transfer performance of an AFM to cloud object storage fileset by using multiple gateway nodes. When this feature is enabled, objects that are stored on a cloud object storage can be fetched in an IBM Storage Scale cluster fileset by using multiple gateway nodes.
To help the primary gateway read large files from the cloud object storage, an AFM cache cluster can be configured to use multiple gateways. On a cloud object storage, the same endpoints can point to a bucket or can set up in the distributed mode, that is, multiple servers by using same storage paths. Cloud object storage servers also provide multi-threaded multi-part download to achieve a maximum read performance.
In an AFM cache, gateway nodes can be mapped to endpoints of cloud object storage services. An export map replaces the --endpoint server name in the mmafmcosconfig command to set up parallel reads. An export map can be changed without modifying the --endpoint option for a fileset. However, the fileset must be relinked or the file system must be remounted for the mapping to take effect. For more information about how to define, display, delete, and update mappings, see mmafmconfig command.
To define and enable the parallel data read transfer for the AFM to cloud object storage, complete the following steps:
- Define the mapping by using the mmafmconfig command.
A gateway can be defined manually by setting the afmGateway parameter. For more information, see mmchfileset command.
- Set up the keys for the target buckets by using the mmafmcoskeys command.
- Use the mapping as an --endpoint when you set the AFM to cloud object storage relationship by using the mmafmcosconfig command.
- Define the parallel read parameters. For more information about the parameters, see Configuration parameters for AFM, AFM-DR, and AFM to cloud object storage.
Parallel reads are effective on files or objects with sizes larger than files or objects that are specified by the parallel threshold. The threshold is defined by using the afmParallelReadThreshold parameter and is true for all types of objects and files.
Use the afmParallelReadChunkSize parameter to configure the size of each chunk. This parameter defines the minimum chunk size of the read that needs to be distributed among the gateway nodes during parallel reads. A zero value disables the parallel reads across multiple gateways.
Example
- Create an export map by using multiple gateway
nodes.
# mmafmconfig add map1_cos --export-map lb1.ait.cleversafelabs.com/c7f2n03,lb1.ait.cleversafelabs.com/c7f2n04 --no-server-resolution
A sample output is as follows:mmafmconfig: Command successfully completed mmafmconfig: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process.
Note: When the --no-server-resolution option is specified, AFM skips the DNS resolution of a specified hostname and creates the mapping. The specified hostname in the mapping must not be replaced with an IP address. Ensure that the specified hostname is resolvable when the mapping is in operation. This resolution is important if an endpoint resolves to multiple addresses and does not bind to a single IP address.# mmafmconfig show map1_cos Map name: map1_cos Export server map: lb1.ait.cleversafelabs.com/c7f2n03,lb1.ait.cleversafelabs.com/c7f2n04
- Set up an access key and a secret key for the bucket by using the export
map.
# mmafmcoskeys cosbucket1:map1_cos set AccessKey SecretKey
- Create an AFM to cloud object storage relation by using the
mmafmcosconfig
command.
# mmafmcosconfig fs1 cosbucket1 --endpoint http://map1_cos --object-fs --bucket cosbucket1 --debug --mode iw afmobjfs=fs1 fileset=cosbucket1 bucket=cosbucket1 newbucket= objectfs=yes dir= policy= tmpdir= tmpfile= cleanup=yes mode=iw xattr=no ssl=no autoRemove=no acls=no gcs=no vhb= bucketName=cosbucket1 region= serverName=map1_cos cacheFsType=http Linkpath=/gpfs/fs1/cosbucket1 target=http://map1_cos/cosbucket1 map=http://map1_cos cacheHost=map1_cos endpoint=lb1.ait.cleversafelabs.com cachePort= endpoint=lb1.ait.cleversafelabs.com ENDPOINT=--endpoint http://map1_cos XOPT= -p afmParallelWriteChunkSize=0 -p afmParallelReadChunkSize=0
Here,- cosbucket1
- An existing bucket with objects presents on the cloud object storage for reading.
- --new-bucket
- This option can be used to create a new bucket.
- Tune the parallel transfer thresholds for parallel reads. The
afmParallelReadThreshold parameter value is 1 GB and the
afmParallelReadChunkSize parameter value is 512
MB.
# mmchfileset fs1 cosbucket1 -p afmParallelReadThreshold=1024
A sample output is as follows:Fileset cosbucket1 changed.
A sample output is as follows:# mmchfileset fs1 cosbucket1 -p afmParallelReadChunkSize=536870912
Fileset cosbucket1 changed.
Note: Set these parameters before you do any operation on the fileset to get these values in effect. If any operations are already done on the fileset, use the mmafmctl stop command, and then set these parameters and start the fileset. - Check that the values are set on the
fileset.
# mmlsfileset fs1 cosbucket1 --afm -L
A sample output is as follows:Filesets in file system 'fs1': Attributes for fileset cosbucket1: =================================== Status Linked Path /gpfs/fs1/cosbucket1 Id 5 Root inode 1572867 Parent Id 0 Created Thu Aug 19 15:24:39 2021 Comment Inode space 3 Maximum number of inodes 100352 Allocated inodes 100352 Permission change flag chmodAndSetacl afm-associated Yes Target http://map1_cos:80/cosbucket1 Mode independent-writer File Lookup Refresh Interval 120 File Open Refresh Interval 120 Dir Lookup Refresh Interval 120 Dir Open Refresh Interval 120 Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Parallel Read Chunk Size 536870912 Parallel Read Threshold 1024 Number of Gateway Flush Threads 8 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Chunk Size 0 IO Flags 0x0 (default)
- List and read the object in the cache.
# ls -lash /gpfs/fs1/cosbucket1/object1 0 -rwxrwx--- 1 root root 2.0G Aug 19 2021 object1
- Get all content of the
object.
# cat /gpfs/fs1/cosbucket1/object1 > /dev/null
- List and read again the object in the
cache.
ls -lash /gpfs/fs1/cosbucket1/object1 2.0G -rwxrwx--- 1 root root 2.0G Aug 19 2021 object1
- Check the fileset status.
# mmafmctl fs1 getstate
A sample output is as follows:Fileset Name Fileset Target Cache State Gateway Node Queue Length Queue numExec ------------ -------------- ------------- ------------ ------------ ------------- cosbucket1 http://map1_cos:80/cosbucket1 Active c7f2n03 0 10
- To view the queue on the primary gateway and the helper gateway, use the following
command:
# mmfsadm saferdump afm cosbucket1
A sample output is as follows:Output snip – Normal Queue: (listed by execution order) (state: Active) 0 Read [1572868.1572868] inflight (4194304 @ 12582912) tid 3454523 2 ReadSplit [1572868.1572868] inflight (486539264 @ 1610612736) tid 3401451
Here,- ReadSplit
- Operations on gateways that are defined in the mapping can be seen.
For more information, see mmfsadm command.IBM Storage Scale: Problem Determination Guide.