Partial file or object caching for AFM to cloud object storage

When the partial file or object caching feature is enabled, an application fetches only required data from an object or a file on a cloud object storage. Therefore, network and local disk space are used more efficiently. If an application does not need to read an entire object or file, this feature must be enabled. This feature is enabled on a block boundary.

The partial caching of an object or a file is controlled by setting the afmPrefetchThreshold parameter value. The default value of this parameter is 0. Because of this value, an entire file is cached and all blocks of a file are fetched after any three blocks are read by the cache and the file is marked as cached. This value is useful for sequentially accessed files that are completely read such as image files. To configure this parameter, see mmchfileset command.

Valid values of the afmPrefetchThreshold parameter are in the range 1 – 100. This parameter value specifies the file size percentage that must be cached before all other the data blocks are automatically fetched into the cache. A higher value is suitable for a file that is accessed partially. When the afmPrefetchThreshold value is set to 100, it disables prefetching an entire file. This value caches only the data blocks that are read by an application. Also, large random-access files that do not fit in the cache are not read completely. When all data blocks are available in the cache, the file is marked as cached.

Note:
  • The download or the prefetch feature fetches full objects from a cloud object storage though they are partially cached previously. Prefetch of partially cached objects pulls the entire objects to the cache.
  • Failover of partially cached objects pushes only the cached data blocks to the target. Uncached blocks are filled with null bytes.
  • Any writes queued on a partially cached object fetch the entire file, even if a prefetch threshold limit is set on the object.

If a write operation is queued on a file that is partially cached, the entire file is cached first, and then the write operation is queued on the file. Appending to a partially cached file does not cache the entire file. Only in the LU mode, the write inset or append on a file that is cached partially caches the entire file even if the prefetch threshold is set on the fileset.

Example

  1. The objectbucket bucket is present on the cloud object storage with two objects as follows:
    Name      : object100M
    Date      : 2021-05-21 08:26:11 EDT 
    Size      : 100 MiB 
    ETag      : 0073370fd78dd34e5043d13ba515b5a2 
    Type      : file 
    Metadata  :
      Content-Type: application/octet-stream 
    Name      : object1G
    Date      : 2021-05-21 08:26:11 EDT 
    Size      : 1000 MiB 
    ETag      : b7faba1ddde52a27fb925858102db50b-8 
    Type      : file 
    Metadata  :
      Content-Type: application/octet-stream 
  2. Set keys on an IBM Storage Scale AFM cache cluster.
    # mmafmcoskeys objectbucket:192.168.118.121 set sdjnlsdlknsf3093mkey1 skdjfnrkfjnergergnwegwrgvlkjfv12
  3. Create an AFM to cloud object storage relationship.
    # mmafmcosconfig fs1 objectfileset --endpoint http://192.168.118.121 --bucket objectbucket  --mode iw --object-fs  --debug
    afmobjfs=fs1 fileset=objectfileset 
    bucket=objectbucket newbucket= objectfs=yes dir= 
    policy= tmpdir= tmpfile= cleanup=no mode=iw 
    xattr=no ssl=no acls=no gcs=no vhb=
    bucketName=objectbucket region= serverName=192.168.118.121 cacheFsType=http
    Linkpath=/gpfs/fs1/objectfileset target=http://192.168.118.121/objectbucket
    map=http://192.168.118.121 cacheHost=192.168.118.121
    endpoint=192.168.118.121 ENDPOINT=--endpoint http://192.168.118.121
    XOPT= -p afmParallelWriteChunkSize=0 -p afmParallelReadChunkSize=0
  4. Verify the fileset information.
    # mmlsfileset fs1 objectfileset --afm -L
    A sample output is as follows:
    Filesets in file system 'fs1':
    
    Attributes for fileset objectfileset:
    ======================================
    Status                                  Linked
    Path                                    /gpfs/fs1/objectfileset
    Id                                      7
    Root inode                              2097155
    Parent Id                               0
    Created                                 Fri May 21 08:31:56 2021
    Comment                                 
    Inode space                             4      
    Maximum number of inodes                100352
    Allocated inodes                        100352
    Permission change flag                  chmodAndSetacl
    afm-associated                          Yes
    Target                                  http://192.168.118.121:80/objectbucket
    Mode                                    independent-writer
    File Lookup Refresh Interval            120
    File Open Refresh Interval              120
    Dir Lookup Refresh Interval             120
    Dir Open Refresh Interval               120
    Async Delay                             15 (default)
    Last pSnapId                            0
    Display Home Snapshots                  no
    Parallel Read Chunk Size                0
    Number of Gateway Flush Threads         4
    Prefetch Threshold                      0 (default)
    Eviction Enabled                        yes (default)
    Parallel Write Chunk Size               0
    IO Flags                                0x0 (default)
    Note: Prefetch threshold limit is set to 0 by default.
  5. Stop the fileset.
    # mmafmctl fs1 stop -j objectfileset
  6. Change the prefetch threshold limit.
    # mmchfileset fs1 objectfileset -p afmprefetchthreshold=100
    A sample output is as follows:
    Fileset objectfileset changed.
  7. Start the fileset.
    # mmafmctl fs1 start -j objectfileset
  8. Verify the prefetch threshold limit.
    # mmlsfileset fs1 objectfileset --afm -L
    A sample output is as follows:
    Filesets in file system 'fs1':
    
    Attributes for fileset objectfileset:
    ======================================
    Status                                  Linked
    Path                                    /gpfs/fs1/objectfileset
    Id                                      7
    Root inode                              2097155
    Parent Id                               0
    Created                                 Fri May 21 08:31:56 2021
    Comment                                 
    Inode space                             4      
    Maximum number of inodes                100352
    Allocated inodes                        100352
    Permission change flag                  chmodAndSetacl
    afm-associated                          Yes
    Target                                  http://192.168.118.121:80/objectbucket
    Mode                                    independent-writer
    File Lookup Refresh Interval            120
    File Open Refresh Interval              120
    Dir Lookup Refresh Interval             120
    Dir Open Refresh Interval               120
    Async Delay                             15 (default)
    Last pSnapId                            0
    Display Home Snapshots                  no
    Parallel Read Chunk Size                0
    Number of Gateway Flush Threads         4
    Prefetch Threshold                      100
    Eviction Enabled                        yes (default)
    Parallel Write Chunk Size               0
    IO Flags                                0x0 (default)
    Note: The prefetch threshold is set to 100. That means only the read blocks are cached.
  9. To check the objects on a cloud object storage, issue the ls command.
    # ls -lash /gpfs/fs1/objectfileset/
    A sample output is as follows:
    total 259K
     512 drwxrws---     5 root root  4.0K May 21 08:34 .
    256K drwxrwxrwx     9 root root  256K May 21 08:31 ..
       0 -rwxrwxrwx     1 root root  100M May 21  2021 object100M
       0 -rwxrwxrwx     1 root root 1000M May 21  2021 object1G
  10. Read the object partially by using the dd command.
    # dd if=/gpfs/fs1/objectfileset/object100M bs=4M count=10 > /dev/urandom
    10+0 records in
    10+0 records out
    41943040 bytes (42 MB, 40 MiB) copied, 1.43027 s, 29.3 MB/s
    # dd if=/gpfs/fs1/objectfileset/object1G bs=4M count=100 > /dev/urandom 
    100+0 records in
    100+0 records out
    419430400 bytes (419 MB, 400 MiB) copied, 14.129 s, 29.7 MB/s
  11. Check the disk usage of these objects. They are the same as the data read by application or the dd command.
    # du -h /gpfs/fs1/objectfileset/object1G
    400M    /gpfs/fs1/objectfileset/object1G
    # du -h /gpfs/fs1/objectfileset/object100M
    40M     /gpfs/fs1/objectfileset/object100M