Listing files exported to the cloud
This topic describes how to parse a manifest file and how to list files from the cloud.
Although files are exported to the cloud from a IBM Spectrum Scale™ environment, the files can be imported by a non-IBM Spectrum Scale application. While you export files to the cloud, a manifest file is built. The manifest file includes a list of these exported files and the metadata associated with native object storage.
When data is exported to the cloud, the manifest file is not automatically pushed to the cloud. You must decide when and where to export the manifest file.
When to transfer: If you are using a policy to export data, a good time to export the manifest is immediately after the policy has successfully executed your executive chain. Waiting too long can result in manifest that is too big and that does not provide frequent enough guidance to applications looking for notifications about new data on the cloud. Constantly pushing out new manifests can create other problems where the applications have to deal with many small manifests, and having to understand which they should use.
Where to transfer: Unlike transparent cloud tiering, cloud data sharing allows data to be transferred to any container at any time. This freedom can be very useful, especially when setting up multiple tenants. A centralized manifest is useful in a single tenant environment, but when there are multiple tenants with different access privileges to different files it may be better to split up your manifest destinations accordingly. Export all data targeted to a particular tenant and then send the manifest. Export data for the next tenant, and so forth.
<File/Object Name> <CloudContainerName> <TagID> <TimeStamp><Newline>
Typically, this file is not accessed directly but rather is accessed using the manifest utility.
<TagID>,<CloudContainerName>,<TimeStamp>,<File/Object Name><newline>
where,- TagID is an optional identifier the object is associated with.
- CloudContainerName is the name of the container the object was exported into.
- TimeStamp follows the format : "DD MON YYYY HH:MM:SS GMT".
- File/Object Name can contain commas, but not new line characters.
0, imagecontainer, 6 Sep 2016 20:31:45 GMT, images/a/cat.scan
You can use the
mmcloudmanifest tool to parse the manifest file that is created by the
mmcloudgateway files export command or by any other means. By looking at the
manifest files, an application can download the desired files from the cloud.- Install Python version 2.7.5
- Install pip. For more information, see https://packaging.python.org/install_requirements_linux/
- Install apache-libcloud package by running the sudo pip install apache-libcloud command.
mmcloudmanifest
ManifestName [--cloud --properties-file PropertiesFile --manifest-container ManifestContainer
[--persist-path PersistPath]]
[--tag-filter TagFilter] [--container-filter ContainerFilter]
[--from-time FromTime] [--path-filter PathFilter]
[--help]
where,- ManifestName: Specifies the name of the manifest object that is there on the cloud. For using a local manifest file, specify the full path name to the manifest file.
- --properties-file PropertiesFile: Specifies the location of the properties file to be used when retrieving the manifest file from the cloud. A template properties file is located at /opt/ibm/MCStore/scripts/provider.properties. This file includes details such as the name of the cloud storage provider, credentials, and URL.
- --persist-path PersistPath: Stores a local copy of the manifest file that is retrieved from the cloud in the specified location.
- --manifest-container ManifestContainer: Name of the container in which the manifest is located.
- --tag-filter TagFilter: Lists only the entries whose Tag ID # matches the specified regular expression (regex).
- --container-filter ContainerFilter: Lists only the entries whose container name matches the specified regex.
- --from-time FromTime: Lists only the entries that occur starting at or after the specified time stamp. The time stamp must be enclosed within quotations, and it must be in the 'DD MONYYYY HH:MM:SS GMT' format. Example: '21 Aug 2016 06:23:59 GMT'
- --path-filter PathFilter: Lists only the entries whose path name matches the specified regex.
mmcloudgateway files export --container arn8781724981111500553 --manifest-file manifest.txt
--tag us-weather /gpfs/weather_data/MetData_Oct06-2016-Oct07-2016-ALL.csv
/gpfs/weather_data/MetData_Oct07-2016-Oct08-2016-ALL.csv
/gpfs/weather_data/MetData_Oct08-2016-Oct09-2016-ALL.csv
/gpfs/weather_data/MetData_Oct09-2016-Oct10-2016-ALL.csv
/gpfs/weather_data/MetData_Oct10-2016-Oct11-2016-ALL.csv
The
following command exports four CSV files tagged with "uk-weather", along with the manifest file,
"manifest.text", to the
cloud:mmcloudgateway files export --container arn8781724981111500553 --manifest-file manifest.txt
--tag uk-weather /gpfs/weather_data/MetData_Oct06-2016-Oct07-2016-ALL.csv
/gpfs/weather_data/MetData_Oct07-2016-Oct08-2016-ALL.csv
/gpfs/weather_data/MetData_Oct08-2016-Oct09-2016-ALL.csv
/gpfs/weather_data/MetData_Oct09-2016-Oct10-2016-ALL.csv
/gpfs/weather_data/MetData_Oct10-2016-Oct11-2016-ALL.csv
So, the container "arn8781724981111500553" contains both US and UK weather data.
mmcloudmanifest parse-manifest manifest.txt --tag-filter us-weather
| xargs mmcloudgateway files import --directory /gpfs --container arun8781724981111500553
You
can verify these files by using the following command:ls -l /gpfs
The system
displays output similar to
this:total 64
drwxr-xr-x. 2 root root 4096 Oct 5 07:09 automountdir
-rw-r--r--. 1 root root 7859 Oct 18 02:15 MetData_Oct06-2016-Oct07-2016-ALL.csv
-rw-r--r--. 1 root root 7859 Oct 18 02:15 MetData_Oct07-2016-Oct08-2016-ALL.csv
-rw-r--r--. 1 root root 14461 Oct 18 02:15 MetData_Oct08-2016-Oct09-2016-ALL.csv
-rw-r--r--. 1 root root 14382 Oct 18 02:15 MetData_Oct09-2016-Oct10-2016-ALL.csv
-rw-r--r--. 1 root root 14504 Oct 18 02:15 MetData_Oct10-2016-Oct11-2016-ALL.csv
drwxr-xr-x. 2 root root 4096 Oct 17 14:12 weather_data