Data eviction from an AFM to cloud object storage fileset after uploading
AFM to cloud object storage fileset employes high-performance caching with its extension bucket located on the cloud object storage. A caching fileset is planned with faster and optimal storage and the data created in the cache can be sent across to the respective cloud object storage backends.
With ever demanding workloads and performance centric cache applications, the storage needs to be optimized at the fileset level. This feature optimizes the storage at the AFM to cloud object storage fileset level by evicting the data of files or objects after these files and objects are successfully uploaded on the cloud object storage buckets. The data eviction of uploaded files and objects frees space in the fileset. This available space helps applications to create new or other files and objects and work on them without hampering performance.
By using mmafmcosctl upload --evict-after-upload command all files or files specified by --object-list can be uploaded, and then evicted from the cache for the space management.
This feature supports the manual updates (MU) mode because of the enabled reconcile feature. By using reconcile, a policy can be defined to reconcile files with AFM to cloud object storage fileset and its respective cloud bucket. The files curated by a user-defined policy or a default policy are uploaded to the bucket, and then evicted from the cache. See mmafmcosctl command for more information.
When an application or administrators require the data of an evicted file from the cache, it can be fetched on demand or can be downloaded by using the mmafmcosctl download command.
Advantages of the data eviction after uploading
- Optimized storage utilization
- By offloading data to the typically lower-cost cloud object storage backend, AFM maximizes the efficiency of the local cache. This prevents unnecessary storage costs and ensures optimal use of high-performance storage.
- Enhanced performance
- A well-managed cache is crucial for maintaining high application performance. The eviction of data to cloud object storage frees up cache space for new, frequently accessed data, preventing performance degradation due to cache misses.
- Cloud object storage reduction
- Depending on storage pricing models, storing data in cloud object storage might be more cost-effective than retaining it in the local cache. Eviction can help reduce overall storage expenses.
Use cases
Quick Data workload: For temporary files, intermediate results, or data with a short access window, eviction after upload is particularly beneficial. It prevents cache getting denser and ensures optimal performance for frequently accessed data.
Long-term Data Retention: Data that needs to be retained for extended periods can be migrated to the more durable and cost-effective cloud object storage. This approach extends data retention while optimizing cache usage.
Example
- Create two filesets in the single writer (SW) mode and the manual update (MU) mode.
where:mmafmcosconfig fs1 fileset1 --endpoint http://s3.us-east-1.amazonaws.com --object-fs --xattr --new-bucket fileset1bucket --mode sw --acls --directory-object # mmafmcosconfig fs1 fileset2 --endpoint http://s3.us-east-1.amazonaws.com --object-fs --xattr --bucket fileset2bucket --mode mu --acls --directory-object
- fileset1
- Is created in the SW mode and this fileset connects to fileset1bucket buckets on the on the cloud object storage.
- fileset2
- Is created in the MU mode and this fileset connects fileset2bucket buckets on the cloud object storage.
- Create
data.
for a in `seq 3`; do dd if=/dev/urandom of=/gpfs/fs1/fileset1/file$a count=40 bs=256k ; done for a in `seq 3`; do dd if=/dev/urandom of=/gpfs/fs1/fileset2/file$a count=40 bs=256k ; done
- List data with space usage.
A sample output is as follows:ls -sh /gpfs/fs1/fileset1
total 30M 10M file1 10M file2 10M file3 Node1] ls -sh /gpfs/fs1/fileset2 total 30M 10M file1 10M file2 10M file3
- Evict the data after uploading from the respective
filesets.
A sample output is as follows:mmafmcosctl fs1 fileset1 /gpfs/fs1/fileset1 upload --all --evict-after-upload
Queued Failed TotalData (approx in Bytes) 3 0 31457280 Object upload successfully queued at the gateway.
- Reconcile the data after uploading to the respective
filesets.
A sample output is as follows:mmafmcosctl fs1 fileset2 /gpfs/fs1/fileset2/ reconcile --evict-after-upload
Dirty file list : /var/mmfs/afm/fs1-7/recovery/policylist.data.list.dirtyFiles Threads in use: 1 Queued (Total) Failed TotalData (approx in Bytes) 3 (3) 0 31457280 Object Upload successfully queued at the gateway.
- Verify that objects are stored on the cloud object
storage.
A sample output is as follows:amazonconsole] /root/mc ls aws/fileset1bucket
[2024-07-26 05:49:38 EDT] 10MiB file1 [2024-07-26 05:49:41 EDT] 10MiB file2 [2024-07-26 05:49:43 EDT] 10MiB file3 amazonconsole] ls aws/fileset2bucket [2024-07-26 05:54:41 EDT] 10MiB file1 [2024-07-26 05:54:42 EDT] 10MiB file2 [2024-07-26 05:54:42 EDT] 10MiB file3
- Check data usage after the data eviction.
A sample output is as follows:ls -sh /gpfs/fs1/fileset1
total 0 0 file1 0 file2 0 file3 Node1] ls -sh /gpfs/fs1/fileset2 total 0 0 file1 0 file2 0 file3