Data eviction from an AFM to cloud object storage fileset after uploading

AFM to cloud object storage fileset employes high-performance caching with its extension bucket located on the cloud object storage. A caching fileset is planned with faster and optimal storage and the data created in the cache can be sent across to the respective cloud object storage backends.

With ever demanding workloads and performance centric cache applications, the storage needs to be optimized at the fileset level. This feature optimizes the storage at the AFM to cloud object storage fileset level by evicting the data of files or objects after these files and objects are successfully uploaded on the cloud object storage buckets. The data eviction of uploaded files and objects frees space in the fileset. This available space helps applications to create new or other files and objects and work on them without hampering performance.

By using mmafmcosctl upload --evict-after-upload command all files or files specified by --object-list can be uploaded, and then evicted from the cache for the space management.

This feature supports the manual updates (MU) mode because of the enabled reconcile feature. By using reconcile, a policy can be defined to reconcile files with AFM to cloud object storage fileset and its respective cloud bucket. The files curated by a user-defined policy or a default policy are uploaded to the bucket, and then evicted from the cache. See mmafmcosctl command for more information.

When an application or administrators require the data of an evicted file from the cache, it can be fetched on demand or can be downloaded by using the mmafmcosctl download command.

Advantages of the data eviction after uploading

Optimized storage utilization
By offloading data to the typically lower-cost cloud object storage backend, AFM maximizes the efficiency of the local cache. This prevents unnecessary storage costs and ensures optimal use of high-performance storage.
Enhanced performance
A well-managed cache is crucial for maintaining high application performance. The eviction of data to cloud object storage frees up cache space for new, frequently accessed data, preventing performance degradation due to cache misses.
Cloud object storage reduction
Depending on storage pricing models, storing data in cloud object storage might be more cost-effective than retaining it in the local cache. Eviction can help reduce overall storage expenses.

Use cases

Quick Data workload: For temporary files, intermediate results, or data with a short access window, eviction after upload is particularly beneficial. It prevents cache getting denser and ensures optimal performance for frequently accessed data.

Long-term Data Retention: Data that needs to be retained for extended periods can be migrated to the more durable and cost-effective cloud object storage. This approach extends data retention while optimizing cache usage.

Workload Optimization: By selectively evicting data based on access patterns or data age, organizations can optimize cache utilization and workload distribution.
Note: The renamed file is not evicted immediately until data is modified.

Example

  1. Create two filesets in the single writer (SW) mode and the manual update (MU) mode.
     mmafmcosconfig fs1 fileset1 --endpoint http://s3.us-east-1.amazonaws.com --object-fs --xattr --new-bucket fileset1bucket --mode sw --acls --directory-object 
    # mmafmcosconfig fs1 fileset2 --endpoint http://s3.us-east-1.amazonaws.com --object-fs --xattr --bucket fileset2bucket --mode mu --acls --directory-object 
    where:
    fileset1
    Is created in the SW mode and this fileset connects to fileset1bucket buckets on the on the cloud object storage.
    fileset2
    Is created in the MU mode and this fileset connects fileset2bucket buckets on the cloud object storage.
  2. Create data.
    for a in `seq 3`; do dd if=/dev/urandom of=/gpfs/fs1/fileset1/file$a count=40 bs=256k ; done
    for a in `seq 3`; do dd if=/dev/urandom of=/gpfs/fs1/fileset2/file$a count=40 bs=256k ; done
  3. List data with space usage.
    ls -sh /gpfs/fs1/fileset1
    A sample output is as follows:
    total 30M
    10M file1 10M file2 10M file3
    
    Node1] ls -sh /gpfs/fs1/fileset2
    total 30M
    10M file1 10M file2 10M file3
  4. Evict the data after uploading from the respective filesets.
     mmafmcosctl fs1 fileset1 /gpfs/fs1/fileset1 upload --all --evict-after-upload
    A sample output is as follows:
    Queued Failed TotalData
     (approx in Bytes)
     3 0 31457280 
    Object upload successfully queued at the gateway.
  5. Reconcile the data after uploading to the respective filesets.
     mmafmcosctl fs1 fileset2 /gpfs/fs1/fileset2/ reconcile --evict-after-upload
    A sample output is as follows:
    Dirty file list : /var/mmfs/afm/fs1-7/recovery/policylist.data.list.dirtyFiles
    Threads in use: 1
     Queued (Total) Failed TotalData
     (approx in Bytes)
     3 (3) 0 31457280 
    Object Upload successfully queued at the gateway.
  6. Verify that objects are stored on the cloud object storage.
    amazonconsole] /root/mc ls aws/fileset1bucket
    A sample output is as follows:
    [2024-07-26 05:49:38 EDT] 10MiB file1
    [2024-07-26 05:49:41 EDT] 10MiB file2
    [2024-07-26 05:49:43 EDT] 10MiB file3
    amazonconsole] ls aws/fileset2bucket
    [2024-07-26 05:54:41 EDT] 10MiB file1
    [2024-07-26 05:54:42 EDT] 10MiB file2
    [2024-07-26 05:54:42 EDT] 10MiB file3
  7. Check data usage after the data eviction.
    ls -sh /gpfs/fs1/fileset1
    A sample output is as follows:
    total 0
    0 file1 0 file2 0 file3
    Node1] ls -sh /gpfs/fs1/fileset2
    total 0
    0 file1 0 file2 0 file3