August 30, 2018 | Written by: John O'Connor
Categorized: How-tos | Storage
Share this post:
Operationalizing SQL Query: Part 3
In this third part of a four-part series on Operationalizing SQL Query, we’ll bring together the microservices we deployed in Part 1 to query data in IBM Cloud Object Storage (COS) using the techniques we developed in Part 2 using IBM SQL Query with the goal of connecting our application’s data to Business Intelligence (BI) tools.
Is your application’s data a stream trickling into a puddle or a rising tide overwhelming the levees? Either way, IBM has you covered with tools to store, retrieve, query, and gain insights from data of any size.
In this article, we’ll take a look at how you can query data from multiple sources stored in the same IBM Cloud Object Storage bucket. We’ll create a demo application using Docker containers running in IBM Cloud Kubernetes Service and then introduce IBM SQL Query, a fully-managed service you can use to query across data stored in Cloud Object Storage, joining together multiple data sources to get a single unified view of data across your application. Finally, we’ll show how IBM SQL Query breaks down barriers to insight by querying data stored across multiple files in multiple formats using different schemas with a single, ANSI SQL statement without any pre-processing.
Let’s get started . . .
Creating our demo application
First, we’ll want to create a microservice application we can use to dump data into COS. If you’d like to deploy this application into your own IBM Cloud Account, you can check out the code on GitHub.
For this example application, we’ll be using traffic count summary data and official street names from the Los Angeles Department of Transportation available on the LADOT open data website. We’ll cross-reference generation of that data using both a standalone NodeJS application and an OpenWhisk function.
We’ll start by downloading the data sets and uploading them to Cloud Object Storage. Open the IBM Cloud Console and create a new bucket for LADOT data by clicking on the Create bucket button.
Give your bucket a name and we’ll keep the rest of the settings as default settings.
Next, let’s set up a starting point for our data. Upload the traffic pattern data directly to our COS bucket by dragging and dropping the CSV file directly into the web console.
Accessing IBM Cloud Object Storage using NodeJS
Let’s kick things off by spinning up a microservice using the Rancher instance we set up in our previous article. We’ll use a base NodeJS container for now to test our microservice and demonstrate how to build and host our own custom docker image on IBM Cloud Kubernetes Service.
Remember to inject the environment variables from the ConfigMap and Secrets we set up previously. This gives us a standard way of accessing our COS credentials from within our containers. We’ll also want to add an additional
PORT environment variable with a value of
80—this tells the microservice which port to expose our HTTP API on.
Next, let’s run a shell in our container and add the base code for our microservice:
Download the code into your container and run it using the
cd /opt && git clone https://github.ibm.com/John-OConnor/COSBucketService.git && cd code && npm install && node code/index.js
Note: If you changed the name of your environment variables from those we used in the previous article, make sure you update the code from the repository above to reflect that. If you used the same ConfigMap keys then you should be able to run the code above and connect directly without any changes.
The microservice has an exposed HTTP interface which has a few basic methods for retrieving buckets and files from COS, but its purpose is to serve as a basic starting point for your own COS-connected microservice from NodeJS using the
ibm-cos-sdk npm library.
Now, let’s access that API from another container, which we can access by name. Deploy a new container using the
alpine image and execute a shell into that container. We won’t be doing much with this container other than testing out our new microservice.
From here we can execute a shell into the container and using CURL to connect to the HTTP interface of our node container.
We’ll add the
curl package to our alpine container first:
apk update && apk add curl
We’ll then run the curl command on our microservice. If all goes well, we should see a response like the following:
# curl trafficcounts/buckets
Now that we have access to COS from our microservices, we can use the S3 API to add and update objects in our buckets. You can find more information about how to do that in our COS Documentation.
Getting started with IBM Cloud SQL Query
IBM Cloud SQL Query is the glue that allows us to combine multiple separate data files in the same COS bucket into a cohesive whole. It uses a simple RESTful API to submit queries, so any microservice that can make a web call can also access SQL Query.
Before we get started, you’ll want to sign up for the SQL Query service. You can use the linked tutorial to set that up or check out a more comprehensive introduction to analyzing data with SQL Query.
Now that we have our SQL Query account set up, we can begin querying our initial data set—the LA DOT Traffic Count Summaries.
Start by creating a new Query microservice using the IBM Cloud SQL npm package, which simplifies the process of connecting. In order to connect, we’ll have to get the Cloud Resource Name (CRN) from the SQL Query service. This allows us to connect to a specific running version of SQL Query. You can get this value from the home page of the SQL Query Service by clicking on the Instance CRN button in the lower-right-hand corner of the page:
You can also get that value from the URL bar of the web interface:
Open up Rancher and select the Secrets tab to add our SQL Query CRN to our global list of credentials:
Open the three-dot dropdown menu and click Edit to edit the Secret:
Then, add your SQL Query Resource CRN so we can inject it automatically when we create our query container:
Now that you’ve updated your credentials, we can deploy a new
node container like we did previously, injecting the environment variables from our ConfigMap and Secret as we did before.
Next, execute a shell and run the following command to download the latest git repo:
cd /opt && git clone https://github.ibm.com/John-OConnor/SQLQueryMicroserviceSamples.git && cd code && npm install && node
This will download the ibmcloudsql node package and create a node terminal for you to tinker with. You can run
node index.js to run the sample application:
You’ll want to snag the bucket URL of your data by going to the COS web UI and selecting Object SQL URL from the context menu:
That will give you a popup where you can copy the URL for that object (also called the Reference URL):
From there, we can simply drop our Reference URL into any SQL query where we would typically put a reference to a database table. For example, a query selecting all of the data in our
LADOT_Traffic_Counts_Summary.csv might look like the following:
select * from cos://us-geo/ladotjoconnor/LADOT_Traffic_Counts_Summary.csv limit 5
We’re now able to write files from our microservices to a COS bucket and query across those files using the SQL Query API. Having an SQL interface to our data, even when stored in multiple files and formats, allows us to connect to that data and gain insights using standard BI tools. In the final article in this series, we’ll use Looker, an industry standard BI tool, to provide insights from our data stored in COS.