How Cloud data sharing works

This topic describes how the Cloud data sharing feature works in the IBM Spectrum Scale™ cluster.

Cloud data sharing allows you to import data from object storage into your IBM Spectrum Scale file system using the import command. You can also export data to the cloud for use in the cloud using the export command. Optionally, there is a manifest that can be built that allows those sharing to very quickly ascertain what files are being shared. Cloud data sharing is different than Transparent cloud tiering in that it is meant as a means of sharing and distributing data whereas with Transparent cloud tiering the data that is migrated to object storage is obfuscated such that it can only be consumed by recalling the data back into the IBM Spectrum Scale system.

Note: There is no implicit consistency link established by the export or import of a file. The import or export is just a transfer of a file. Once the transfer is complete, if the original file is changed at one place it has no bearing on its copies. And if a copy of a file is changed or deleted it has no bearing on the original.

How exporting file system data to cloud storage works

Figure 1. Exporting file system data to a cloud storage tier

The Cloud data sharing service allows you to export data to the cloud. Data can be exported manually but typically it is done leveraging the ILM Policy Manager that comes with IBM Spectrum Scale. When the data is sent to the object storage it is distributed evenly amongst all the nodes in the cloud services node group providing the Cloud data sharing service. Also, optionally as a part of the export request, a list of what was sent is also put in a manifest file which can be used to track what was exported. Export is very flexible in that for each command the target container can be specified – any container accessible by the given cloud account may be used – and for each export command, the target manifest file can also be specified. The manifest file is not automatically stored in the cloud storage but rather remains stored within the Cloud data sharing service. The manifest is exported on demand to the desired target location.

How importing object storage data into the Spectrum Scale file system works

Figure 2. Importing object storage data into the file system

Data can be imported directly by providing a list of objects to be imported in the command itself or by providing a pointer to a manifest which contains the list of objects to be imported. The files are imported and placed in a target directory in the file system. Since the files are not yet present in the file system, there is no way to use the policy engine to initially import the files. There is, however, a way to import only the stub of the file and then use the policy engine looking at the file stub information to import the data subset that is of interest (and delete the stubs not of interest).

How to be aware of and exploit what data has been shared

A manifest may be kept which lists information on objects that are in cloud storage available for sharing. When the Cloud data sharing service exports files to object storage, it can be directed to add entries on what it exported to a manifest. Other applications generating cloud data can also generate a manifest (more on that in the next section). The location of where the manifest is stored and when it is exported to be looked at is the decision of the application – it is not automatic. A typical time to export the manifest to object storage is in the same cron job as the policy invocation immediately following the policy execution completion so that it represents all the files exported by that policy run.

A manifest utility is available that can read a manifest and provide a comma separated value output stream of all the files in the manifest, or a subset of those files as provided by some simple filtering. This manifest utility is written in Python and can run separately from the cloud data sharing service most anywhere python can run. The utility is also built into cloud data sharing and can be used to get input for import operations.

Currently, there is no way to receive asynchronous notification of updates to what has been shared – the containers where the manifests reside must be polled.

How a native cloud storage application can prepare a set of objects for sharing

There is a simple way for native cloud storage applications or other generators of cloud object data to generate a manifest. The manifest utility can build the manifest from a list of objects that it is passed.