Access Files in Object Storage from Streaming Analytics

Share this post:

Do you have a Streams application that you’re interested in running in the cloud? If your application reads files from the local disk, some additional work may be necessary to make your files available in the cloud. This post describes how to access files from the Bluemix Object Storage Service so that they can be ingested by your application.

The “local file” problem for cloud applications

In the most common scenario, you have an application that uses a FileSource operator to read in the contents of a file. However, in the cloud, you do not have access to the local file system and therefore you can’t copy the files your application uses directly to the file system. One solution to this dilemma is to include data files in your Streams application bundle, but this approach is not suitable in every case:

  • It is not practical for very large files as it will significantly increase the size of the bundle leading to longer build times and slower application submission times.
  • It does not work for data files that will change after the bundle is submitted.

In these cases, you will need an alternative method of accessing your files.

Object Storage Service solves the local file problem

The Bluemix Object Storage service provides you with access to a fully provisioned Swift Object Storage account to manage your data. This data can then be accessed via the OpenStack Object Storage (Swift) API. This service provides you with a solution for getting your files onto the cloud for your Streams application.

Integrating the Object Storage Service with your Streams application

To utilize the Object Store service in your Streams application, the approach presented in this post is to create a composite operator that downloads a file from the Object Store. This operator can then be placed in front of any FileSource operators to download the required file to local disk space.

Sample Streams Application using Object Storage

This section provides an overview of the “ObjectStorage” integration sample. To follow along, you can view the source in the Streams Integration Samples repository. The sample application utilizes native functions from the Inet Toolkit 2.7.0.

The Main composite operator is located in the application/Main.spl file and is composed of three operators.

The GetFile operator is an invocation of the GetFileFromObjectStorage custom composite operator which downloads a file from the Object Store. Once the file is downloaded, the operator sends the name of the local file to the ReadFile operator which is an instance of a FileSource operator. The ReadFile operator then sends the contents of the local file to the LogOutput operator which writes the tuples to the console log.

The GetFile operator is an invocation of the custom composite operator named GetFileFromObjectStorage. The source for this operator can be found in the application/ObjectStore.spl file. It uses native Streams Processing Language (SPL) functions to first authenticate and then download a file from the Object Storage.

  • An httpPost call is made to the Object Store using the objectStoreUserId, objectStorePassword, and objectStoreProjectID parameters. Upon a successful post, the response headers contain an authentication token required to interact with the Object Store.
  • The authentication key is extracted from the response headers.
  • An httpGet call is made to retrieve the file from the Object Storage using the objectStoreContainer and objectStoreFile parameters.
  • The file is written to the local dataDirectory using the native methods fileOpen and fwriteString.
  • A tuple containing the name of the local file is then sent downstream.

Hands-on with the Streaming Analytics Sample

Create and configure your Object Store instance

  • Log into Bluemix. In the dashboard, click USE SERVICES OR APIS and then choose Object Storage under the Storage category of the service catalog. Note, there are multiple Object Storage services, be sure to use the one under the Storage category.
  • Select your space, app, service name, and plan and click Create. Note, for this integration sample you do not need to bind the service to an application and you can use the free plan.
  • This will take you to the Object Store Manage page. From the Actions drop-down, select Add container. From the pop-up, give the container a name and click Create.
  • Once created, the container will show up on the Manage page under Storage Containers. From the Actions drop-down, select Add File and use the popup window to upload a file of your choice. The sample application is currently written to handle files using the line format, so almost any type of file will work.
  • Finally, navigate to the the Service Credentials panel and note the userID, password, and projectId credentials. These will be needed to authenticate to the Object Storage.

Create your Streaming Analytics instance

  • From the Bluemix dashboard, click USE SERVICES OR APIS. Choose Streaming Analytics from the catalog. Choose the default plan and click CREATE to create the service.
  • The Streaming Analytics dashboard will appear. Use the Launch button to launch the Streaming Analytics console.

Download and submit the Streams Application sample

  • Download the zip file from the Streams Integration Samples project in Bluemix DevOps Services. This zip file contains the pre-built Streams application you will deploy, as well as the source code for the application.
  • Extract the zip to your local file system.
  • Go to the Streaming Analytics console that you launched earlier, and select Submit Job.
  • A Submit Job view will appear. Click Browse and navigate to the root of the zip file that you extracted, then into the ObjectStore directory. Select the application.Main.sab file. Then click Next in the Submit Job dialog.
  • On the next view, you are prompted to fill in the submission time parameters required by the application. Set the data directory to /tmp and fill in the values for ObjectStore-UserId, ObjectStore-Password, ObjectStore-ProjectId, ObjectStore-Container, and ObjectStore-File that you saved in the earlier steps.

  • Press Submit, and the sample Streams application will be deployed to your Streaming Analytics instance.

Viewing the Streams Application sample

If you were able to successfully complete the instructions listed above, you should now have a running Streaming Analytics application. You can use the streams console to validate that the job is successfully running. From the console’s Application Dashboard, locate the Streams Graph and validate that the operators are healthy.

To validate that the file was successfully downloaded and read by the FileSource operator, use the console’s Log Viewer:

  • Find and select the Log Viewer tab on the left side navigation.
  • From the Log Navigation Tree view, expand the navigation tree and select the LogOutput PE.
  • In the LogViewer, select the Console Log tab.
  • Click on Load console messages to load the output for the operator. The output should contain the tuples from the file that was downloaded from the object store.

Additional Scenarios

Now that you know how to read a file from Object Storage, you can modify this basic approach to cover other useful scenarios. In other words, you can create custom composite operators to interact with the Object Store in the following ways.

Identifying when a file changes in Object Storage

To identify when a file has changed, you could download it again and do a file comparison. However, there is an easier way. See the Object HEAD API call to get a file’s metadata information to identify when it was last modified.

Identifying new files in Object Storage

Similar to the DirectoryScan operator that monitors a directory in the local file system for new files, you may find it useful to know when a new file is available in your Object Storage. Using the Container GET API call, you can query your container for changes and identify when new files have been added.

Storing a file in Object Storage

Although this post only describes downloading a file, you can also upload a file to the Object Storage. See the Object PUT API call which allows you to store a file for future use by your application or even a Bluemix application.

Learn More

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Storage Stories

Home automation powered by Cloud Functions, Raspberry Pi, Twilio and Watson

Over the past few years, we’ve seen a significant rise in popularity for intelligent personal assistants, such as Apple’s Siri, Amazon Alexa, and Google Assistant. Though they initially appeared to be little more than a novelty, they’ve evolved to become rather useful as a convenient interface to interact with service APIs and IoT connected devices.

Continue reading

Interpreting Spring Social Twitter Data with Watson Tone Analyzer

In this post, I'll show you how to build a basic Spring app with Twitter login using Spring Social. Then we'll use Watson Tone Analyzer to determine the dominant emotion from each of the tweets on the time of the logged-in user. The project we will create will be similar to the Accessing Twitter Data Spring guide, but with a few modifications.

Continue reading

Arria brings Natural Language Generation to IBM Cloud

The Arria Natural Language Generation APIs service is an addition to the Finance category on the IBM Cloud platform. This blog post shows you how to get started with Arria’s Natural Language Generation APIs service on the IBM Cloud platform.

Continue reading