Creating A Microservice Data Lake With IBM Cloud Object Storage and IBM SQL Query

Share this post:

In this third part of a four-part series on Operationalizing SQL Query, we’ll take a look at using IBM Cloud Object Storage to access data from multiple microservices and perform some simple queries using IBM SQL Query. Part 1 and Part 2 of this series are available if you need to get up to speed.

Is your application’s data a stream trickling into a puddle or a rising tide overwhelming the levees? Either way, IBM has you covered with tools to store, retrieve, query, and gain insights from data of any size.

We’re going to take a look at how you can query data from multiple sources stored in the same IBM Cloud Object Storage (COS) bucket. We’ll create a demo application using Docker containers running in the IBM Cloud Kubernetes Service and then introduce IBM SQL Query, a fully-managed service you can use to query across data stored in Cloud Object Storage and join together multiple data sources to get a single unified view of data across your application. Finally, we’ll show how IBM SQL Query breaks down barriers to insight by querying data stored across multiple files in multiple formats using different schemas with a single ANSI SQL statement without any pre-processing.

Creating our demo application

We’re going to start off by creating a micro-service application we can use to dump data into Cloud Object Storage (COS). If you’d like to deploy this application into your own IBM Cloud Account, you can check out the code on github.

For this example application, we’ll be using traffic count summary data and official street names from the Los Angeles Department of Transportation that are available on the LADOT open-data website. We’ll cross-reference generation of that data using both a standalone NodeJS application and an OpenWhisk function.

We’ll start by downloading the data sets and uploading them to COS. Open the IBM Cloud Console and create a new bucket for LADOT data by clicking on the Create bucket button.

Give your bucket a name and keep the rest of the settings as default settings.

Next, let’s set up a starting point for our data by uploading the traffic pattern data directly to our COS bucket. You can do this by dragging and dropping the CSV file directly into the web console.

Accessing IBM Cloud Object Storage using NodeJS

Let’s kick things off by spinning up a microservice using the Rancher instance we set up in our previous article. For now, we’ll use a base NodeJS container to test our microservice and demonstrate how to build and host our own custom docker image on IBM Cloud Kubernetes Service.

Remember to inject the environment variables from the ConfigMap and Secrets we set up previously. This gives us a standard way of accessing our COS credentials from within our containers. We’ll also want to add an additional PORT environment variable with a value of 80—this tells the microservice on which port to expose our HTTP API.

Next, let’s run a shell in our container and add the base code for our microservice:

Download the code into your container and run it using the node executable.

cd /opt && git clone && cd code && npm install && node code/index.js

Note: If you changed the name of your environment variables from those we used in the previous article, make sure you update the code from the repository above to reflect that. If you used the same ConfigMap keys, you should be able to run the code above and connect directly without any changes.

The microservice has an exposed HTTP interface which has a few basic methods for retrieving buckets and files from COS, but its purpose is to serve as a basic starting point for your own COS-connected microservice from NodeJS using the ibm-cos-sdk npm library.

Now, let’s access that API from another container. Deploy a new container using the alpine image and execute a shell into that container. We won’t be doing much with this container other than testing out our new microservice.

From here, we can execute a shell into the container and use CURL to connect to the HTTP interface of our node container.

We’ll add the curl package to our alpine container first:

apk update && apk add curl

We’ll then run the curl command on our microservice. If all goes well, we should see a response like the following:

# curl trafficcounts/buckets

Now that we have access to COS from our microservices, we can use the S3 API to add and update objects in our buckets. You can find more information about how to do that in our COS Documentation.

Getting started with IBM Cloud SQL Query

IBM Cloud SQL Query is the glue that allows us to combine multiple separate data files in the same COS bucket into a cohesive whole. It uses a simple RESTful API to submit queries, so any microservice that can make a web call can also access SQL Query.

Before we get started, you’ll want to sign up for the SQL Query service. You can use the linked tutorial to set that up or check out a more comprehensive introduction to analyzing data with SQL Query.

Now that we have our SQL Query account set up, we can begin querying our initial data set—the LADOT Traffic Count Summaries.

Start by creating a new query microservice using the IBM Cloud SQL npm package, which simplifies the process of connecting. In order to connect, we’ll have to get the Cloud Resource Name (CRN) from the SQL Query service. This allows us to connect to a specific running version of SQL Query. You can get this value from the home page of the SQL Query Service by clicking on the Instance CRN button in the lower-right-hand corner of the page:

You can also get that value from the URL bar of the web interface:

Open up Rancher and select the Secrets tab to add our SQL Query CRN to our global list of credentials.

Click on the three-dot menu and click Edit to edit the secret:

Then, add your SQL Query Resource CRN so we can inject it automatically when we create our query container.

Now that you’ve updated your credentials, we can deploy a new node container like we did previously, injecting the environment variables from our ConfigMap and Secret as we did before.

Next, execute a shell and run the following command to download the latest git repo:

cd /opt && git clone && cd code && npm install && node

This will download the ibmcloudsql node package and create a node terminal for you to tinker with. You can run node index.js to run the sample application.

You’ll want to snag the bucket URL of your data by going to the COS web UI and selecting Object SQL URL from the context menu:

This will give you a popup where you can copy the URL for that object (also called the Reference URL):

From there, we can simply drop our Reference URL into any SQL query where we would typically put a reference to a database table. For example, a query selecting all of the data in our LADOT_Traffic_Counts_Summary.csv might look like the following:

select * from cos://us-geo/ladotjoconnor/LADOT_Traffic_Counts_Summary.csv limit 5

Wrapping up

We’re now able to write files from our microservices to a COS bucket and query across those files using the SQL Query API. Having an SQL interface to our data, even when stored in multiple files and formats, allows us to connect to that data and gain insights using standard BI tools. In the final article in this series, we’ll use Looker, an industry standard BI tool, to provide insights from our data stored in COS.

Additional References

Developer Advocate, IBM Watson and Cloud Platform

More How-tos stories
August 13, 2018

CI/CD Pipeline for OpenWhisk Functions Using Whisk Deploy

The article presents a technique for developing a CI/CD pipeline in IBM Cloud for OpenWhisk functions using Whisk Deploy configuration cataloged in GitHub.

Continue reading

August 2, 2018

Connecting to IBM Cloud Object Store in Kubernetes

Operationalizing IBM SQL Query: Part 2. In this article, we'll take a look at the best practices for connecting to IBM Cloud Object Storage from docker containers deployed in the IBM Cloud Kubernetes Service.

Continue reading

August 1, 2018

What’s Included in the IBM Cloud Developer Tools CLI Version 2.1.0

See what's included in version 2.1.0 of the IBM Cloud Developer Tools CLI.

Continue reading