Simple access to your S3-based data on IBM Cloud.

I am a regular user of cloud object storage (COS). I love how I can integrate it with my apps and solutions, both in remote (mostly cloud) environments or local. Today, I am going to share with you how I manage my storage buckets and access the data. My data scraping project and the archive logs for the IBM Cloud Activity Tracker serve as examples.

Most cloud object storage services support the Amazon S3 (Simple Storage Service) API, including IBM Cloud Object Storage. This has led to many tools and SDKs being available. In this blog, I focus on the MinIO client and rclone, which I use in my setup.

Overview

For many solutions, I store file data to IBM Cloud Object Storage (COS). COS is a highly available and secure platform for storing objects. Objects can be all kinds of files. They are organized by buckets (“directory”). The buckets can be protected by your own encryption keys. The data in COS is accessible in many ways — through the IBM Cloud console, through the IBM Cloud command line interface (CLI) or through its S3 API and SDKs based on it.

In my data scraping project with Code Engine jobs, I am using the CLI to upload local files to the storage bucket. Thanks to the regularly executed jobs, the project-related bucket holds many files. How do I make those files available to my scripts and Jupyter notebooks for data analysis?

For my IBM Cloud account, I set up and configured the Activity Tracker along with log archiving to COS. Over time, there are many log files. How do I manage them? How do I make them available for analysis and integration with other log data?

I am using two open-source command line tools to process files in S3-based object storage — the MinIO client “mc” and rclone. The documentation for IBM Cloud Object Storage even has details on how to set up and use the MinIO client and rclone. Both tools work with HMAC credentials of a COS instance. Thus, an advantage over using the IBM Cloud CLI is that those tools work without logging in to your IBM Cloud account. Both tools also mimic the typical UNIX/Linux shell commands, so there is not much learning required. 

MinIO client mc

The MinIO server offers an S3-compatible implementation of a high-performance storage server. The client mc allows you to interact with S3-compatible storage services and provides typical UNIX/Linux commands like ls, cat, cp or mv. It can be configured through the command itself or by editing a configuration file. The standard configuration even includes access to a “play” server offered by MinIO.

To configure access to IBM Cloud Object Storage, you need to obtain HMAC credentials for your service instance. It is an access key and its related secret. Moreover, you need to specify which endpoint to use. Remember, that depending on storage resiliency class and service type, not all endpoints allow access to all of the possible global storage buckets

I configured access to my COS service in “eu-de” and named it coseu. Then, I could list the buckets like this:

mc ls coseu

To find all files of a certain pattern in the bucket mybucket and copy them over to my local data folder, I could issue the following command:

mc find coseu/mybucket --name "*20211115_*json.gz" --exec "mc cp {} data/"

Instead of copying over individual files, I could simply mirror the remote, COS-based directory to a local one:

mc mirror coseu/mybucket data

Overall, the setup and operating the client are easy and simplify the handling of data located in cloud object storage. In the following screen capture, I first list the available folders for my Activity Tracker data, then mirror the available data to a local directory:

List files and mirror them with mc.

rclone

The tool rclone was designed to manage data located in cloud object storage. It supports many vendors and even features interactive setup. Again, you need the HMAC credentials and endpoint information. After the setup, I could use the following to list the available buckets for my service coseu:

rclone lsd coseu:/

To get to the files in mybucket, use the following command:

rclone ls coseu:/mybucket

Similarly, to sync (mirror) two directories, this command does the job:

rclone sync coseu:/mybucket data

There are many more commands to discover. The following even launches a browser-based UI (officially in experimental state):

rclone rcd --rc-web-gui

While I typically prefer the command line, the UI comes in handy for drilling down into hierarchically organized files like the archived Activity Tracker logs. They are organized by year, month and day. The following screenshot shows how I use the filter to only list files starting with the specified pattern:

Manage files in the rclone browser UI.

Conclusions

I use IBM Cloud Object Storage for many apps and solutions. Tools like the MinIO client and rclone help to manage the stored data. Both tools are fairly easy to set up and simplify daily operations on the command line or for automation. rclone even offers a browser UI. 

Want to get started and set those tools up? Here are the links again:

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn

Categories

More from Cloud

IBM Tech Now: October 2, 2023

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 86 On this episode, we're covering the following topics: AI on IBM Z IBM Maximo Application Suite 8.11 IBM NS1 Connect Stay plugged in You can check out the IBM Blog Announcements for a…

IBM Cloud inactive identities: Ideas for automated processing

4 min read - Regular cleanup is part of all account administration and security best practices, not just for cloud environments. In our blog post on identifying inactive identities, we looked at the APIs offered by IBM Cloud Identity and Access Management (IAM) and how to utilize them to obtain details on IAM identities and API keys. Some readers provided feedback and asked on how to proceed and act on identified inactive identities. In response, we are going lay out possible steps to take.…

IBM Cloud VMware as a Service introduces multitenant as a new, cost-efficient consumption model

4 min read - Businesses often struggle with ongoing operational needs like monitoring, patching and maintenance of their VMware infrastructure or the added concerns over capacity management. At the same time, cost efficiency and control are very important. Not all workloads have identical needs and different business applications have variable requirements. For example, production applications and regulated workloads may require strong isolation, but development/testing, training environments, disaster recovery sites or other applications may have lower availability requirements or they can be ephemeral in nature,…

IBM accelerates enterprise AI for clients with new capabilities on IBM Z

5 min read - Today, we are excited to unveil a new suite of AI offerings for IBM Z that are designed to help clients improve business outcomes by speeding the implementation of enterprise AI on IBM Z across a wide variety of use cases and industries. We are bringing artificial intelligence (AI) to emerging use cases that our clients (like Swiss insurance provider La Mobilière) have begun exploring, such as enhancing the accuracy of insurance policy recommendations, increasing the accuracy and timeliness of…