Fast Track your Big Data Development

This article is co-authored by Chris Snow (chris.snow@uk.ibm.com) and Pierre Regazzoni (pierrer@us.ibm.com).

IBM® BigInsights™ on Cloud provides Hadoop-as-a-service on IBM’s SoftLayer® global cloud infrastructure. It supports a large variety of ways that you can process your data and integrate with other services.

So how do you start with a cloud data service such as IBM® BigInsights™ on Cloud and might be asking yourself:

“how do I programmatically perform action X on my data on service Y?”
“how do I programmatically move data between service Y and service Z?”

These questions usually need to be addressed early and quickly in the project’s life-cycle. As such, they are usually addressed during sprint zero to create the basic skeleton and plumbing for the project so that future sprints can truly add incremental value in an efficient way [1].

Starting from scratch, each question can easily take anywhere from a few hours to a few days to answer by the time you have researched the different options and developed some skeleton code.

In this blog post, we would like to introduce you to an open source project [2] that provides working examples of over 30 actions and integrations (and growing) that you can see running against your IBM® BigInsights™ cluster and your own services that you wish to integrate with. All you need to do is provide the connection details of your IBM® BigInsights™ cluster and the service(s) you wish to connect to and you can run a single command to see the example running against your environment. It is possible to set up the examples and run them in under five minutes!

The current examples are listed below:

Hdfs (Using Knox API – WebHDFS)

Ambari

BigR

Connect to BigR

BigSQL

Hive

Spark (run inside a ssh session on the BigInsights cluster)

Oozie (Using Knox API)

HBase

WebHCat/Templeton (Using Knox API)

Knox

Run a knox shell client session

Cloudant

Object Store (Swift, S3)

dashDB

Elasticsearch

Let’s look at a few use cases for moving data between BigInsights and dashDB. As you can see from the list above, there are three examples you can try:

Pull data from a dashDB database to HDFS using Spark
Push data to dashDB database using Spark
Pull data from a dashDB database using Big SQL

The first two examples use Apache Spark to pull data from dashDB to BigInsights, and the second example uses Big SQL.

Running the examples is simple, you first need to checkout and setup the examples:

Clone this repository git clone https://github.com/snowch/biginsight-examples.git
Copy connection.properties_template to connection.properties
Edit connection.properties to add your connection details for BigInsights and other optional services such as dashDB
Export the cluster certificate from your browser
In your connection.properties uncomment the line # known_hosts:allowAnyHosts
Setup driver library by running: ./gradlew DownloadLibs (unix) gradlew.bat DownloadLibs (windows) to download libraries from the cluster

Now let’s just run one example:

Unix, run

./gradlew -p examples/DashDBIntegrationWithBigSQL Example

Windows, run

gradlew.bat -p examples/DashDBIntegrationWithBigSQL Example

The above command creates a table in hadoop and populates it with data from dashDB. If you developed the code for this from scratch, you could easily burn a few hours or a few days on it. Instead in around 5 minutes, you have been able to see some example code running against your own environment.

To run the whole set of examples at once, you can run:

./gradlew test (unix)

gradlew.bat test (windows)

(detailed output for the tests can be found in the folder ./build/test/).

For more information and get the code on the project, visit: https://github.com/snowch/biginsight-examples

We encourage you to also look at the code and provide comments/ideas on future example you’d like to see.
—

[1] https://www.scrumalliance.org/community/articles/2013/september/what-is-sprint-zero
[2] https://github.com/snowch/biginsight-examples

Tips

Fast Track your Big Data Development - Hadoop Dev

Technical Blog Post

Abstract

Body

UID

Share your feedback

Need support?