How-tos

Querying Your Distributed Application Data Using IBM Cloud Object Storage and IBM SQL Query

Share this post:

Operationalizing SQL Query: Part 3

In this third part of a four-part series on Operationalizing SQL Query, we’ll bring together the microservices we deployed in Part 1 to query data in IBM Cloud Object Storage (COS) using the techniques we developed in Part 2 using IBM SQL Query with the goal of connecting our application’s data to Business Intelligence (BI) tools.

Is your application’s data a stream trickling into a puddle or a rising tide overwhelming the levees? Either way, IBM has you covered with tools to store, retrieve, query, and gain insights from data of any size.

In this article, we’ll take a look at how you can query data from multiple sources stored in the same IBM Cloud Object Storage bucket. We’ll create a demo application using Docker containers running in IBM Cloud Kubernetes Service and then introduce IBM SQL Query, a fully-managed service you can use to query across data stored in Cloud Object Storage, joining together multiple data sources to get a single unified view of data across your application. Finally, we’ll show how IBM SQL Query breaks down barriers to insight by querying data stored across multiple files in multiple formats using different schemas with a single, ANSI SQL statement without any pre-processing.

Let’s get started . . .

Creating our demo application

First, we’ll want to create a microservice application we can use to dump data into COS. If you’d like to deploy this application into your own IBM Cloud Account, you can check out the code on GitHub.

For this example application, we’ll be using traffic count summary data and official street names from the Los Angeles Department of Transportation available on the LADOT open data website. We’ll cross-reference generation of that data using both a standalone NodeJS application and an OpenWhisk function.

We’ll start by downloading the data sets and uploading them to Cloud Object Storage. Open the IBM Cloud Console and create a new bucket for LADOT data by clicking on the Create bucket button.

Give your bucket a name and we’ll keep the rest of the settings as default settings.

Next, let’s set up a starting point for our data. Upload the traffic pattern data directly to our COS bucket by dragging and dropping the CSV file directly into the web console.

Accessing IBM Cloud Object Storage using NodeJS

Let’s kick things off by spinning up a microservice using the Rancher instance we set up in our previous article. We’ll use a base NodeJS container for now to test our microservice and demonstrate how to build and host our own custom docker image on IBM Cloud Kubernetes Service.

Remember to inject the environment variables from the ConfigMap and Secrets we set up previously. This gives us a standard way of accessing our COS credentials from within our containers. We’ll also want to add an additional PORT environment variable with a value of 80—this tells the microservice which port to expose our HTTP API on.

Next, let’s run a shell in our container and add the base code for our microservice:

Download the code into your container and run it using the node executable.
cd /opt && git clone https://github.ibm.com/John-OConnor/COSBucketService.git && cd code && npm install && node code/index.js
Note: If you changed the name of your environment variables from those we used in the previous article, make sure you update the code from the repository above to reflect that. If you used the same ConfigMap keys then you should be able to run the code above and connect directly without any changes.

The microservice has an exposed HTTP interface which has a few basic methods for retrieving buckets and files from COS, but its purpose is to serve as a basic starting point for your own COS-connected microservice from NodeJS using the ibm-cos-sdk npm library.

Now, let’s access that API from another container, which we can access by name. Deploy a new container using the alpine image and execute a shell into that container. We won’t be doing much with this container other than testing out our new microservice.

From here we can execute a shell into the container and using CURL to connect to the HTTP interface of our node container.

We’ll add the curl package to our alpine container first:

apk update && apk add curl

We’ll then run the curl command on our microservice. If all goes well, we should see a response like the following:

# curl trafficcounts/buckets
{"Buckets":[
{"Name":"ladotjoconnor",
"CreationDate":"2018-07-20T23:59:12.087Z"},
,
"Owner":{...}}

Now that we have access to COS from our microservices, we can use the S3 API to add and update objects in our buckets. You can find more information about how to do that in our COS Documentation.

Getting started with IBM Cloud SQL Query

IBM Cloud SQL Query is the glue that allows us to combine multiple separate data files in the same COS bucket into a cohesive whole. It uses a simple RESTful API to submit queries, so any microservice that can make a web call can also access SQL Query.

Before we get started, you’ll want to sign up for the SQL Query service. You can use the linked tutorial to set that up or check out a more comprehensive introduction to analyzing data with SQL Query.

Now that we have our SQL Query account set up, we can begin querying our initial data set—the LA DOT Traffic Count Summaries.

Start by creating a new Query microservice using the IBM Cloud SQL npm package, which simplifies the process of connecting. In order to connect, we’ll have to get the Cloud Resource Name (CRN) from the SQL Query service. This allows us to connect to a specific running version of SQL Query. You can get this value from the home page of the SQL Query Service by clicking on the Instance CRN button in the lower-right-hand corner of the page:

You can also get that value from the URL bar of the web interface:

Open up Rancher and select the Secrets tab to add our SQL Query CRN to our global list of credentials:

Open the three-dot dropdown menu and click Edit to edit the Secret:

Then, add your SQL Query Resource CRN so we can inject it automatically when we create our query container:

Now that you’ve updated your credentials, we can deploy a new node container like we did previously, injecting the environment variables from our ConfigMap and Secret as we did before.

Next, execute a shell and run the following command to download the latest git repo:

cd /opt && git clone https://github.ibm.com/John-OConnor/SQLQueryMicroserviceSamples.git && cd code && npm install && node

This will download the ibmcloudsql node package and create a node terminal for you to tinker with. You can run node index.js to run the sample application:

You’ll want to snag the bucket URL of your data by going to the COS web UI and selecting Object SQL URL from the context menu:

That will give you a popup where you can copy the URL for that object (also called the Reference URL):

From there, we can simply drop our Reference URL into any SQL query where we would typically put a reference to a database table. For example, a query selecting all of the data in our LADOT_Traffic_Counts_Summary.csv might look like the following:

select * from cos://us-geo/ladotjoconnor/LADOT_Traffic_Counts_Summary.csv limit 5

Wrapping up

We’re now able to write files from our microservices to a COS bucket and query across those files using the SQL Query API. Having an SQL interface to our data, even when stored in multiple files and formats, allows us to connect to that data and gain insights using standard BI tools. In the final article in this series, we’ll use Looker, an industry standard BI tool, to provide insights from our data stored in COS.

Additional references

Developer Advocate, IBM Watson and Cloud Platform

More How-tos stories
September 20, 2018

Deploying to IBM Cloud Private 2.1.0.3 with IBM Cloud Developer Tools CLI

IBM Cloud Developer Tools CLI version 2.1.4 adds deployment support for IBM Cloud Private 2.1.0.3. This version of IBM Cloud Private uses a more secure Helm for Kubernetes deployments and simplifies the cluster configuration for the client compared to prior IBM Cloud Private releases.

Continue reading

September 19, 2018

Tutorial: Apply End-to-End Security to Cloud Applications

A new tutorial will show you how to use IBM Cloud services to secure your cloud application. Capture and review security-related events, encrypt storage, integrate authentication, and more.

Continue reading

September 18, 2018

How To Convert CSV Objects to Parquet in Cloud Object Storage

If you're looking to lower storage costs by compressing your data and get better query performance when querying the data in Cloud Object Storage, you may want to click to learn how to convert CSV objects to Parquet.

Continue reading