How Cloud data sharing works

This topic describes how the Cloud data sharing feature works in the IBM Storage Scale cluster.

Cloud data sharing allows you to import data from object storage into your IBM Storage Scale file system using the import command. You can also export data to the cloud for use in the cloud by using the export command. Optionally, there is a manifest that can be built that allows those sharing to very quickly ascertain what files are being shared.

Cloud data sharing is different from Transparent cloud tiering because it is meant as a means of sharing and distributing data, whereas with Transparent cloud tiering the data that is migrated to object storage is concealed so it can be consumed only by recalling the data back into the IBM Storage Scale system.
Note: There is no implicit consistency link that is established by the export or import of a file. The import or export is just a transfer of a file. Once the transfer is complete, if the original file is changed at one place it has no bearing on its copies. And if a copy of a file is changed or deleted it has no bearing on the original.
How exporting file system data to cloud storage works
Figure 1. Exporting file system data to a cloud storage tier
How exporting file system data to cloud storage works.
You can use the Cloud data sharing service to export data to the cloud. Data can be exported manually, but typically it is done leveraging the ILM Policy Manager that comes with IBM Storage Scale. When data transfer requests are generated, they are distributed evenly between all nodes in the Cloud services node group providing the Cloud data sharing service.

Also, optionally as a part of the export request, a list of what was sent is also put in a manifest file that can be used to track what was exported. Export is very flexible because for each command the target container can be specified. Any container accessible by the given cloud account can be used. For each export command, you can also specify the target manifest file. The manifest file is not automatically stored in the cloud storage but rather remains stored within the Cloud data sharing service. The manifest is exported on demand to the desired target location.

How importing object storage data into the IBM Storage Scale file system works
Figure 2. Importing object storage data into the file system
How importing object storage data into theIBM Storage Scalefile system works .
You can import data directly by providing a list of objects to be imported in the command itself or by providing a pointer to a manifest that contains the list of objects to be imported. The files are imported and placed in a target directory in the file system. Since the files are not yet present in the file system, there is no way to use the policy engine to initially import the files. You can also import the stub of the file and then use the policy engine to look at the file stub information that you use to import the desired data subset. You can delete the stubs that you are not interested in.
Note: The import of the stub only works if the data on the object storage that is being imported was originally created using the IBM Storage Scale export service so that the stub has the appropriate format. (Native object storage stubs are not mapped or normalized as of yet).

How to be aware of and exploit what data is shared

A manifest might be kept which lists information on objects that are in cloud storage available for sharing. When the Cloud data sharing service exports files to the object storage, it can be directed to add entries on what it exported to a manifest. Other applications that generate cloud data can also generate a manifest. The location of where the manifest is stored and when it is exported to be looked at is the decision of the application – it is not automatic. A typical time to export the manifest to object storage is in the same cron job as the policy invocation immediately following the policy execution completion so that it represents all the files that are exported by that policy run.

The mmcloudmanifest manifest utility program is available that can read a manifest and provide a comma-separated value (CSV) output stream of all the files in the manifest, or a subset of those files as provided by some simple filtering. This manifest utility is written in Python and can run separately from the cloud data sharing service most anywhere python can run. The utility is also built into cloud data sharing and can be used to get input for import operations.

Currently, there is no built-in way to do asynchronous notification or to have a database that tracks all updates at a scale larger than is reasonable with manifest files, but it is possible to add notification in the script where the export policy is invoked. Also, manifests can be readily consumed by a database (for example the Elastic Search, Logstash, Kibana ELK Stack) since the data structure is very simple. On larger deployments, such an approach with asynchronous notification that is built into the scripts (and uses a database) is worth consideration.

How a native cloud storage application can prepare a set of objects for sharing

There is a simple way for native cloud storage applications or other generators of cloud object data to generate a manifest. The manifest utility can build the manifest from a list of objects that it is passed.