Manage Your Cloud Object Storage Data with the MinIO Client and rclone

4 min read

Simple access to your S3-based data on IBM Cloud.

I am a regular user of cloud object storage (COS). I love how I can integrate it with my apps and solutions, both in remote (mostly cloud) environments or local. Today, I am going to share with you how I manage my storage buckets and access the data. My data scraping project and the archive logs for the IBM Cloud Activity Tracker serve as examples.

Most cloud object storage services support the Amazon S3 (Simple Storage Service) API, including IBM Cloud Object Storage. This has led to many tools and SDKs being available. In this blog, I focus on the MinIO client and rclone, which I use in my setup.

Overview

For many solutions, I store file data to IBM Cloud Object Storage (COS). COS is a highly available and secure platform for storing objects. Objects can be all kinds of files. They are organized by buckets ("directory"). The buckets can be protected by your own encryption keys. The data in COS is accessible in many ways — through the IBM Cloud console, through the IBM Cloud command line interface (CLI) or through its S3 API and SDKs based on it.

In my data scraping project with Code Engine jobs, I am using the CLI to upload local files to the storage bucket. Thanks to the regularly executed jobs, the project-related bucket holds many files. How do I make those files available to my scripts and Jupyter notebooks for data analysis?

For my IBM Cloud account, I set up and configured the Activity Tracker along with log archiving to COS. Over time, there are many log files. How do I manage them? How do I make them available for analysis and integration with other log data?

I am using two open-source command line tools to process files in S3-based object storage — the MinIO client "mc" and rclone. The documentation for IBM Cloud Object Storage even has details on how to set up and use the MinIO client and rclone. Both tools work with HMAC credentials of a COS instance. Thus, an advantage over using the IBM Cloud CLI is that those tools work without logging in to your IBM Cloud account. Both tools also mimic the typical UNIX/Linux shell commands, so there is not much learning required. 

MinIO client mc

The MinIO server offers an S3-compatible implementation of a high-performance storage server. The client mc allows you to interact with S3-compatible storage services and provides typical UNIX/Linux commands like ls, cat, cp or mv. It can be configured through the command itself or by editing a configuration file. The standard configuration even includes access to a "play" server offered by MinIO.

To configure access to IBM Cloud Object Storage, you need to obtain HMAC credentials for your service instance. It is an access key and its related secret. Moreover, you need to specify which endpoint to use. Remember, that depending on storage resiliency class and service type, not all endpoints allow access to all of the possible global storage buckets

I configured access to my COS service in "eu-de" and named it coseu. Then, I could list the buckets like this:

mc ls coseu

To find all files of a certain pattern in the bucket mybucket and copy them over to my local data folder, I could issue the following command:

mc find coseu/mybucket --name "*20211115_*json.gz" --exec "mc cp {} data/"

Instead of copying over individual files, I could simply mirror the remote, COS-based directory to a local one:

mc mirror coseu/mybucket data

Overall, the setup and operating the client are easy and simplify the handling of data located in cloud object storage. In the following screen capture, I first list the available folders for my Activity Tracker data, then mirror the available data to a local directory:

List files and mirror them with mc.

List files and mirror them with mc.

rclone

The tool rclone was designed to manage data located in cloud object storage. It supports many vendors and even features interactive setup. Again, you need the HMAC credentials and endpoint information. After the setup, I could use the following to list the available buckets for my service coseu:

rclone lsd coseu:/

To get to the files in mybucket, use the following command:

rclone ls coseu:/mybucket

Similarly, to sync (mirror) two directories, this command does the job:

rclone sync coseu:/mybucket data

There are many more commands to discover. The following even launches a browser-based UI (officially in experimental state):

rclone rcd --rc-web-gui

While I typically prefer the command line, the UI comes in handy for drilling down into hierarchically organized files like the archived Activity Tracker logs. They are organized by year, month and day. The following screenshot shows how I use the filter to only list files starting with the specified pattern:

Manage files in the rclone browser UI.

Manage files in the rclone browser UI.

Conclusions

I use IBM Cloud Object Storage for many apps and solutions. Tools like the MinIO client and rclone help to manage the stored data. Both tools are fairly easy to set up and simplify daily operations on the command line or for automation. rclone even offers a browser UI. 

Want to get started and set those tools up? Here are the links again:

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn

Be the first to hear about news, product updates, and innovation from IBM Cloud