Get started with Streaming Analytics and BigInsights on Bluemix using HDFS

Share this post:


Streams applications can integrate with HDFS in on-premise BigInsights clusters using the streamsx.hdfs toolkit. However, an extra layer of security in the cloud requires a special toolkit to access the BigInsights service in Bluemix. The HDFS for Bluemix toolkit contains Streams operators that can connect through the Knox Gateway. This article shows how to use these operators to read and write files to HDFS on Bluemix. The operators work in both a local Streams install and using the Streaming Analytics Bluemix service.

Why integrate with Hadoop?

Integrating with Hadoop in Bluemix empowers Streams applications.

  • Reading from Hadoop enables Streams to ingest data both already in Hadoop and as it is stored to HDFS.
  • Writing to Hadoop prepares the data for future data-at-rest analytics.


Write to HDFS

Import HDFSBluemixDemo into Streams Studio from the streamsx.hdfs/samples/ directory. If the project doesn’t build successfully, make sure you have added the HDFS for Bluemix toolkit to Streams Studio.  We’ll start with the TestWrite application in the demo project to demonstrate writing to HDFS.

In Streams Studio, launch the TestWrite application. Fill in the username, password, and url submission time values with your BigInsights credentials and webhdfs url. The webhdfs url will look similar to webhdfs://[Ambari console ip]:8443. Leave the file parameter as-is to write to the /tmp directory in HDFS.

Read more about submission time values here.

The TestWrite application writes to HDFS:


  • Input (Beacon) outputs rstring tuples to print to the HDFS file.
  • Sink (HDFS2FileSink) prints received tuples to remote HDFS file write_test_[timestamp].txt.
  • Log (Custom) prints the name of the file written by the HDFS2FileSink operator.

Viewing PE console output

The TestWrite application prints the name of the file to the Streams PE console.

View the demo’s output by viewing the instance graph, then showing PE Console of the printer operator. You should see the name and size of the newly created file:

Wrote 450 bytes to file hdfs_test_20151211_172531.txt

Viewing the file written to HDFS

The demo creates write_test_[timestamp].txt on the Hadoop filesystem in the /tmp directory.

To view the file, log onto the Hadoop system via SSH and run:

$ hadoop fs -ls /tmp

Read from HDFS

After verifying the HDFS file was written, we’ll try to read the file using Streams using the TestRead sample.

The TestRead application retrieves a file from HDFS and prints its contents:


  • FromHDFS (HDFS2FileSource) reads the file back into Streams from HDFS, and outputs each line as an rstring tuple.
  • PrintFileContents (Custom) prints each incoming rstring to the PE console.

Launch the application from Streams Studio, specifying the same credentials and url as the TestWrite application.  Set the file parameter to the same value as was printed by the TestWrite sample.

After launching the TestRead application, view its output in the PE console of the PrintContents operator. You should see the file contents – ten numbered lines, each reading:

Output read from remote file: HDFS and Streams on Bluemix Test: New LINE #

Running the samples on Bluemix Streaming Analytics

The demo applications can also be deployed to the Streaming Analytics cloud service.

  1. Create a Streaming Analytics service by following the “Finding the service” section of Introduction to Bluemix Streaming Analytics.
  2. Click Launch in the Streaming Analytics Bluemix dashboard to launch the Streams console.
  3. From the Streams console, select “Submit Job” under the “play” icon.
    Streaming Analytics submit job
  4. Browse for and select the .sab file in your workspace directory: workspace/HDFSBluemixDemo/output/hdfsexample.TestWrite>/Distributed
  5. Click Next, enter the HDFS url, username, password, and file name as submission time values, and submit.
  6. After the operators start up, they will show a green circle in the Streams graph view. If not, resubmit and verify your submission time values.

After launching the TestWrite application, repeat the steps above to launch the TestRead sample.  View its output by loading the Console Log of the PrintContents operator.

Streaming Analytics log view
The file is printed “backwards” in the logs because the log viewer shows most recent logs first.


Bluemix HDFS operators enable Streams to read and write HDFS files on BigInsights. The HDFS for Bluemix toolkit works with Streams on-premise and in the Streaming Analytics Bluemix service.

Add Comment

Leave a Reply

Your email address will not be published.Required fields are marked *

chris snow

On BigInsights on cloud, the ssl certificate presented with WebHDFS connections is self signed. Does streams bypass https certificate verification when connection to WebHDFS?



By default, Streams will accept all server certificates. If you need to validate SSL certificates, export the certificate and add it to your truststore. See step 4 of the documentation at$$$1.html


Jacques Roy

It looks like the API has changed since this article was published. Any plans on updating the article?

More Data Analytics Stories

Time to migrate to new Weather Company Data service

If you created your Bluemix account with us prior to July 1st, you are accessing the old Insights for Weather service. To continue evaluating the weather data service, you must migrate your application. It only takes three easy steps.

Continue reading

Building a Fitness App: 8 Million Steps and Counting

This lack of physical activity and increased weight gain (along with the season of good weather approaching) inspired a group of us technology fanatics here at IBM to design a fitness app running on the data and analytics technologies of IBM Watson Data Platform. The goal of this gamified solution is to help teams reach their desired daily activity goals, and have fun while doing it.

Continue reading

Get a free Cloudant Developer Edition instance in less than a minute

IBM Cloudant Developer Edition is now available as a free downloadable Docker image. Download it today to use this fully featured standalone (single node) version of Cloudant Local.

Continue reading