Partial file or object caching for AFM to cloud object storage
When the partial file or object caching feature is enabled, an application fetches only required data from an object or a file on a cloud object storage. Therefore, network and local disk space are used more efficiently. If an application does not need to read an entire object or file, this feature must be enabled. This feature is enabled on a block boundary.
The partial caching of an object or a file is controlled by setting the afmPrefetchThreshold parameter value. The default value of this parameter is 0. Because of this value, an entire file is cached and all blocks of a file are fetched after any three blocks are read by the cache and the file is marked as cached. This value is useful for sequentially accessed files that are completely read such as image files. To configure this parameter, see mmchfileset command.
Valid values of the afmPrefetchThreshold parameter are in the range 1 – 100. This parameter value specifies the file size percentage that must be cached before all other the data blocks are automatically fetched into the cache. A higher value is suitable for a file that is accessed partially. When the afmPrefetchThreshold value is set to 100, it disables prefetching an entire file. This value caches only the data blocks that are read by an application. Also, large random-access files that do not fit in the cache are not read completely. When all data blocks are available in the cache, the file is marked as cached.
- The download or the prefetch feature fetches full objects from a cloud object storage though they are partially cached previously. Prefetch of partially cached objects pulls the entire objects to the cache.
- Failover of partially cached objects pushes only the cached data blocks to the target. Uncached blocks are filled with null bytes.
- Any writes queued on a partially cached object fetch the entire file, even if a prefetch threshold limit is set on the object.
If a write operation is queued on a file that is partially cached, the entire file is cached first, and then the write operation is queued on the file. Appending to a partially cached file does not cache the entire file. Only in the LU mode, the write inset or append on a file that is cached partially caches the entire file even if the prefetch threshold is set on the fileset.
Example
- The objectbucket bucket is present on the cloud object storage with two
objects as follows:
Name : object100M Date : 2021-05-21 08:26:11 EDT Size : 100 MiB ETag : 0073370fd78dd34e5043d13ba515b5a2 Type : file Metadata : Content-Type: application/octet-stream
Name : object1G Date : 2021-05-21 08:26:11 EDT Size : 1000 MiB ETag : b7faba1ddde52a27fb925858102db50b-8 Type : file Metadata : Content-Type: application/octet-stream
- Set keys on an IBM Storage Scale AFM cache
cluster.
# mmafmcoskeys objectbucket:192.168.118.121 set sdjnlsdlknsf3093mkey1 skdjfnrkfjnergergnwegwrgvlkjfv12
- Create an AFM to cloud object storage
relationship.
# mmafmcosconfig fs1 objectfileset --endpoint http://192.168.118.121 --bucket objectbucket --mode iw --object-fs --debug afmobjfs=fs1 fileset=objectfileset bucket=objectbucket newbucket= objectfs=yes dir= policy= tmpdir= tmpfile= cleanup=no mode=iw xattr=no ssl=no acls=no gcs=no vhb= bucketName=objectbucket region= serverName=192.168.118.121 cacheFsType=http Linkpath=/gpfs/fs1/objectfileset target=http://192.168.118.121/objectbucket map=http://192.168.118.121 cacheHost=192.168.118.121 endpoint=192.168.118.121 ENDPOINT=--endpoint http://192.168.118.121 XOPT= -p afmParallelWriteChunkSize=0 -p afmParallelReadChunkSize=0
- Verify the fileset
information.
# mmlsfileset fs1 objectfileset --afm -L
A sample output is as follows:Filesets in file system 'fs1': Attributes for fileset objectfileset: ====================================== Status Linked Path /gpfs/fs1/objectfileset Id 7 Root inode 2097155 Parent Id 0 Created Fri May 21 08:31:56 2021 Comment Inode space 4 Maximum number of inodes 100352 Allocated inodes 100352 Permission change flag chmodAndSetacl afm-associated Yes Target http://192.168.118.121:80/objectbucket Mode independent-writer File Lookup Refresh Interval 120 File Open Refresh Interval 120 Dir Lookup Refresh Interval 120 Dir Open Refresh Interval 120 Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Parallel Read Chunk Size 0 Number of Gateway Flush Threads 4 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Chunk Size 0 IO Flags 0x0 (default)
Note: Prefetch threshold limit is set to 0 by default. - Stop the
fileset.
# mmafmctl fs1 stop -j objectfileset
- Change the prefetch threshold
limit.
# mmchfileset fs1 objectfileset -p afmprefetchthreshold=100
A sample output is as follows:Fileset objectfileset changed.
- Start the
fileset.
# mmafmctl fs1 start -j objectfileset
- Verify the prefetch threshold
limit.
# mmlsfileset fs1 objectfileset --afm -L
A sample output is as follows:Filesets in file system 'fs1': Attributes for fileset objectfileset: ====================================== Status Linked Path /gpfs/fs1/objectfileset Id 7 Root inode 2097155 Parent Id 0 Created Fri May 21 08:31:56 2021 Comment Inode space 4 Maximum number of inodes 100352 Allocated inodes 100352 Permission change flag chmodAndSetacl afm-associated Yes Target http://192.168.118.121:80/objectbucket Mode independent-writer File Lookup Refresh Interval 120 File Open Refresh Interval 120 Dir Lookup Refresh Interval 120 Dir Open Refresh Interval 120 Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Parallel Read Chunk Size 0 Number of Gateway Flush Threads 4 Prefetch Threshold 100 Eviction Enabled yes (default) Parallel Write Chunk Size 0 IO Flags 0x0 (default)
Note: The prefetch threshold is set to 100. That means only the read blocks are cached. - To check the objects on a cloud object storage, issue the ls
command.
# ls -lash /gpfs/fs1/objectfileset/
A sample output is as follows:total 259K 512 drwxrws--- 5 root root 4.0K May 21 08:34 . 256K drwxrwxrwx 9 root root 256K May 21 08:31 .. 0 -rwxrwxrwx 1 root root 100M May 21 2021 object100M 0 -rwxrwxrwx 1 root root 1000M May 21 2021 object1G
- Read the object partially by using the dd
command.
# dd if=/gpfs/fs1/objectfileset/object100M bs=4M count=10 > /dev/urandom
10+0 records in 10+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 1.43027 s, 29.3 MB/s
# dd if=/gpfs/fs1/objectfileset/object1G bs=4M count=100 > /dev/urandom
100+0 records in 100+0 records out 419430400 bytes (419 MB, 400 MiB) copied, 14.129 s, 29.7 MB/s
- Check the disk usage of these objects. They are the same as the data read by application or the
dd command.
# du -h /gpfs/fs1/objectfileset/object1G
400M /gpfs/fs1/objectfileset/object1G
# du -h /gpfs/fs1/objectfileset/object100M
40M /gpfs/fs1/objectfileset/object100M