How Transparent cloud tiering works

This topic describes how you can use Transparent cloud tiering (which tiers files to object storage) feature in the IBM Spectrum Scale cluster.

Transparent cloud tiering is a IBM Spectrum Scale feature that allows the IBM Spectrum Scale cluster to tier files to object storage. The Transparent cloud tiering service stores a file as two objects out in the cloud in two separate containers. These containers are also called vaults or buckets depending on which cloud storage vendor you use. One object holds the file's data and the other object holds the file's metadata. The metadata contains items such as ACLs, extended attributes, and other information that allows Transparent cloud tiering to do a full restore of the file from the cloud tier.

When a file is migrated to the cloud, the metadata of the file is retained in the IBM Spectrum Scale file system. The only reason for copying the metadata to the cloud is to manage scenarios where a file is destroyed on the cluster for some reason (such as accidental deletion) and the file needs to be restored from the cloud.

For the usual case, only the file's data is recalled from the cloud. You can tier the files by migrating the data to the cloud. This means that the data is no longer on the IBM Spectrum Scale cluster and is stored only on the cloud. Such files are considered to be in the non-resident state. Alternatively, you can pre-migrate the data to the object storage so that the data is retained in the IBM Spectrum Scale cluster, but is also copied out to object storage so that the data is on both tiers. Such files are considered to be in the co-resident state.

Transparent cloud tiering stores information about files that are migrated to the cloud tier in a database called cloud directory. A separate database or cloud directory is kept for every Transparent cloud tiering container pair (the pair of containers that hold the file data and metadata). This database contains a list and versions of all files that are migrated to the cloud. The metadata contains items such as ACLs, extended attributes, and other information that allows Transparent cloud tiering to do a full restore of the file from the cloud tier in the future.

Because some applications access the front end of the file frequently, there is an option by specifying "thumbnail-size" to choose how much of the front end of a file is to be kept in the file system when a file is migrated out to the object storage. Some applications such as Windows Explorer and Linux® GNOME only access a very small amount of data on the front end as a part of directory services and other applications such as media streamers might want to cache hundreds of megabytes. By specifying the size it is possible to efficiently accommodate both such applications.

Transparent cloud tiering has maintenance activities that remove data from the cloud for files that are deleted or reversioned. Backup and reconciliation services are performed to deal with disaster recovery and other unusual error clean-up cases. You can change the times and frequencies of these activities.

Additionally, if a file is migrated and only the metadata is changed later, a subsequent migration copies only the metadata to the cloud. This metadata references the original data object.

Note: Transparent cloud tiering data is migrated to the cloud in a way that is not designed for direct use in the cloud. If data must be consumed in the cloud, you must consider the Cloud data sharing service.

Data migration and configuration activities happen through the defined Cloud services nodes. You can allow transparent recalls to be handled directly (and thus more efficiently and with better performance) by client nodes by setting up this option when installing the client.