February 11, 2016 | Written by: Ryan Cradick
Categorized: How-tos | Storage
Share this post:
Do you have a Streams application that you’re interested in running in the cloud? If your application reads files from the local disk, some additional work may be necessary to make your files available in the cloud. This post describes how to access files from the Bluemix Object Storage Service so that they can be ingested by your application.
The “local file” problem for cloud applications
In the most common scenario, you have an application that uses a
FileSource operator to read in the contents of a file. However, in the cloud, you do not have access to the local file system and therefore you can’t copy the files your application uses directly to the file system. One solution to this dilemma is to include data files in your Streams application bundle, but this approach is not suitable in every case:
- It is not practical for very large files as it will significantly increase the size of the bundle leading to longer build times and slower application submission times.
- It does not work for data files that will change after the bundle is submitted.
In these cases, you will need an alternative method of accessing your files.
Object Storage Service solves the local file problem
The Bluemix Object Storage service provides you with access to a fully provisioned Swift Object Storage account to manage your data. This data can then be accessed via the OpenStack Object Storage (Swift) API. This service provides you with a solution for getting your files onto the cloud for your Streams application.
Integrating the Object Storage Service with your Streams application
To utilize the Object Store service in your Streams application, the approach presented in this post is to create a composite operator that downloads a file from the Object Store. This operator can then be placed in front of any
FileSource operators to download the required file to local disk space.
Sample Streams Application using Object Storage
This section provides an overview of the “ObjectStorage” integration sample. To follow along, you can view the source in the Streams Integration Samples repository. The sample application utilizes native functions from the Inet Toolkit 2.7.0.
The Main composite operator is located in the
application/Main.spl file and is composed of three operators.
GetFile operator is an invocation of the
GetFileFromObjectStorage custom composite operator which downloads a file from the Object Store. Once the file is downloaded, the operator sends the name of the local file to the
ReadFile operator which is an instance of a
FileSource operator. The
ReadFile operator then sends the contents of the local file to the
LogOutput operator which writes the tuples to the console log.
GetFile operator is an invocation of the custom composite operator named
GetFileFromObjectStorage. The source for this operator can be found in the
application/ObjectStore.spl file. It uses native Streams Processing Language (SPL) functions to first authenticate and then download a file from the Object Storage.
httpPost call is made to the Object Store using the
objectStoreProjectID parameters. Upon a successful post, the response headers contain an authentication token required to interact with the Object Store.
- The authentication key is extracted from the response headers.
httpGet call is made to retrieve the file from the Object Storage using the
- The file is written to the local
dataDirectory using the native methods
- A tuple containing the name of the local file is then sent downstream.
Hands-on with the Streaming Analytics Sample
Create and configure your Object Store instance
- Log into Bluemix. In the dashboard, click USE SERVICES OR APIS and then choose Object Storage under the Storage category of the service catalog. Note, there are multiple Object Storage services, be sure to use the one under the Storage category.
- Select your space, app, service name, and plan and click Create. Note, for this integration sample you do not need to bind the service to an application and you can use the free plan.
- This will take you to the Object Store Manage page. From the Actions drop-down, select Add container. From the pop-up, give the container a name and click Create.
- Once created, the container will show up on the Manage page under Storage Containers. From the Actions drop-down, select Add File and use the popup window to upload a file of your choice. The sample application is currently written to handle files using the
line format, so almost any type of file will work.
- Finally, navigate to the the Service Credentials panel and note the userID, password, and projectId credentials. These will be needed to authenticate to the Object Storage.
Create your Streaming Analytics instance
- From the Bluemix dashboard, click USE SERVICES OR APIS. Choose Streaming Analytics from the catalog. Choose the default plan and click CREATE to create the service.
- The Streaming Analytics dashboard will appear. Use the Launch button to launch the Streaming Analytics console.
Download and submit the Streams Application sample
- Download the zip file from the Streams Integration Samples project in Bluemix DevOps Services. This zip file contains the pre-built Streams application you will deploy, as well as the source code for the application.
- Extract the zip to your local file system.
- Go to the Streaming Analytics console that you launched earlier, and select Submit Job.
- A Submit Job view will appear. Click Browse and navigate to the root of the zip file that you extracted, then into the
ObjectStore directory. Select the
application.Main.sab file. Then click Next in the Submit Job dialog.
- On the next view, you are prompted to fill in the submission time parameters required by the application. Set the data directory to
/tmp and fill in the values for ObjectStore-UserId, ObjectStore-Password, ObjectStore-ProjectId, ObjectStore-Container, and ObjectStore-File that you saved in the earlier steps.
- Press Submit, and the sample Streams application will be deployed to your Streaming Analytics instance.
Viewing the Streams Application sample
If you were able to successfully complete the instructions listed above, you should now have a running Streaming Analytics application. You can use the streams console to validate that the job is successfully running. From the console’s Application Dashboard, locate the Streams Graph and validate that the operators are healthy.
To validate that the file was successfully downloaded and read by the
FileSource operator, use the console’s Log Viewer:
- Find and select the Log Viewer tab on the left side navigation.
- From the Log Navigation Tree view, expand the navigation tree and select the LogOutput PE.
- In the LogViewer, select the Console Log tab.
- Click on Load console messages to load the output for the operator. The output should contain the tuples from the file that was downloaded from the object store.
Now that you know how to read a file from Object Storage, you can modify this basic approach to cover other useful scenarios. In other words, you can create custom composite operators to interact with the Object Store in the following ways.
Identifying when a file changes in Object Storage
To identify when a file has changed, you could download it again and do a file comparison. However, there is an easier way. See the Object HEAD API call to get a file’s metadata information to identify when it was last modified.
Identifying new files in Object Storage
Similar to the
DirectoryScan operator that monitors a directory in the local file system for new files, you may find it useful to know when a new file is available in your Object Storage. Using the Container GET API call, you can query your container for changes and identify when new files have been added.
Storing a file in Object Storage
Although this post only describes downloading a file, you can also upload a file to the Object Storage. See the Object PUT API call which allows you to store a file for future use by your application or even a Bluemix application.