Data Analytics

Connecting to streams: What you need to know

Share this post:

Using Streams Operators that listen for a connection in the Streaming Analytics Bluemix Service

Overview:

The Streams product provides operators that can listen on a port for incoming connections and then read or write data on that connected port. These are a popular set of operators used in on-premise applications that do not work in the Streaming Analytics Service unless you perform some additional tasks. When running in the Streaming Analytics Service, incoming connections are only allowed from an application running in the secure Bluemix environment. This article describes three options for how to deal with Streaming Analytics Service applications that currently depend on incoming connections.

  • Option 1: Use different Streams operators that initiate connections from within the Streams application in place of those listening for connections
  • Option 2: Use a proxy servlet running in Bluemix
  • Option 3: Use the Bluemix Secure Gateway

What operators listen on ports and why are they useful?

The Streams product provides numerous toolkits. Some of these toolkits provide operators that listen for connections on specific ports and then either read or write data once the connection is established. A couple of examples are the TCPSource and TCPSink when configured with the role : server parameter. Other examples are operators in the com.ibm.streamsx.inet toolkit. This toolkit contains a variety of operators and functions to interact with streams through http.

The techniques described in this article apply to any of the Streams operators listening for connections on ports, but this article will specifically focus on the com.ibm.streamsx.inet.rest operators which provide HTTP REST apis to interact with streaming data.
These operators require that a connection be initiated from outside the streams application to the application resource where the operator is running on the specified port on which the operator is listening for connections.

Access to streaming data is provided through the HTTPTupleView and HTTPXMLView operators. These two operators support HTTP GET requests to fetch the latest version of a window of the streaming data in the operator’s state.
Data can be injected into a stream through HTTPTupleInjection, HTTPJSONInjection and HTTPXMLInjection operators. Each operator extracts data from an HTTP POST and submits a tuple on an output port.
WebContext provides a mechanism to serve files at a fixed web context, a fixed path relative to the root URL of the web server.

The Inet REST toolkit is often used to develop simple front end user interfaces for Streams demo applications.

So what happens when you have or want to write one of these simple demo apps and run it in the Streaming Analytics Service?

Bluemix Security Architecture

The IBM® Bluemix® platform has layered security controls across network and infrastructure. You can read more about this at Bluemix Security.

For Bluemix application users, secure access is provided through a firewall with intrusion prevention and network security in place. Bluemix applications run in this secure environment.
The Streaming Analytics Service applications run behind a second secure gateway in a Services area. External applications are prevented from initiating connections directly into the Streaming Analytic resources in this Services area. When running in the Streaming Analytics Service, incoming connections are only allowed from an application running in the secure Bluemix environment.

So a Streams application written using the com.ibm.streamsx.inet toolkit has the problem that external applications do not have the ability to directly call these REST APIs because they cannot access resources in the Services area. Fortunately there are two important items that will help us deal with the security restrictions:

    • Streams applications can initiate connections to systems outside the Services area.
    • Applications running in the secure Bluemix environment can initiate connections into the Streaming Analytic resources.

This provides us with some options if you wish to use the INET REST operators.

Option 1: Use different streams operators

The first option is to replace the use of the com.ibm.streamsx.inet toolkit with different Streams operators.

OK, so I know this is kind of cheating to say the way to deal with them is not to use them, but the techniques for actually using them suffer from some drawbacks.

Using the INET REST operators can be convenient, but present some drawbacks when used in Bluemix. Most importantly, they tie the web application(s) that is part of your overall solution to a specific IP address where the operator is hosted. This makes it difficult to achieve application high availability without additional work. If the operator listening for connections fails, it can be restarted on a different application node, but that is at a different IP address and clients may not expect or be capable of dynamically figuring out the correct IP address to attempt to connect to. Also the need to dynamically determine the IP address of the resource that the operator is running on makes depending on the IP address brittle. (Stopping and restarting a Streaming Analytics Service instance potentially yields a different set of application resources with different IP addresses).

It may be more appropriate to use a different Streams adapter operator, one that does not rely on external connectivity into the Services area but rather the connections are initiated by the Streams application. You might consider using Kafka, MQ, MQTT or HTTP operators. This leads to a best practice of not using the INET REST operators for robust production applications.

Our two sample applications NYCTraffic Sample and Event Detection Sample demonstrate how to use the Streaming Analytics Service. Both use the HTTPost operator to initiate connections from within streams application and push data to a web server running as a Bluemix application that can be accessed from the public internet. This approach can be configured to provide a highly available and secure production environment.

Further, if you have a streams application that was originally written using the INET REST operators, it is a pretty straight forward exercise to replace these operators. The article Streaming Analytics RSS Text Analytics Demo shows an example which includes a link to instructions for modifying an existing on-premise application to remove the the INET REST operators for use in the cloud. The specific info about replacing the the INET REST operators can be found at: Streams Application in the Streams RSS Demo in the Cloud pdf file.

Other technology and adapters to consider are to use Kafka or MQ where connections are initiated from within the streams application to a separate non-streams servers. Get Started with Streaming Analytics + Message Hub provides a tutorial using Kafka in Message Hub.

Option 2: Use a proxy servlet

If the information you are serving does not require a secure connection you can use a proxy servlet running in a Bluemix Liberty runtime to interact between public internet through a proxy to the Streams INET REST operators.

The Streaming Analytics Airport Sentiment Demo article describes a demo application that uses a Streams application to read from multiple sources, do analytics and make the results available via a browser. The Streams application uses the HTTPTupleView and WebContext operators to provide the data. It uses a Bluemix Liberty application that runs a proxy servlet to provide the public web interface and interact with the Streams INET REST operators.

The nice thing about this approach is there are no Streams application changes required. It does however use an apache licensed servlet that you run in a Bluemix Liberty application. That servlet does not provide security. Further that application has the IP address where the INET REST operator is running hard coded in its deployed configuration file so you need to rebuild/redeploy the Bluemix application every time the IP address where the INET REST operator is running changes.

Despite these limitations, this approach is probably the quickest way to get an existing Streams application dependent on the INET REST operators running in the Streaming Analytics Service and available on the public internet.

Option 3: Use Secure Gateway

The IBM Bluemix Secure Gateway service allows connectivity from Bluemix applications to local/on-premise destinations. However, a lesser-known scenario is that it can also be used to connect back from your on-premise system to cloud-based destinations. This second scenario can be used to provide connectivity from an on-premise network to Streams operators that listen for connections.
The article Using the Secure Gateway to Connect from Your Local Network to the Cloud provides a description of how to use the Secure Gateway to allow a restrictive on-premise network to access a public site.
The same basic steps described in this article can be followed to connect to a Streams operator instead of the the public site. When defining the Cloud destination in the gateway you would provide the connectivity information (server and port) the Streams operator uses instead of the example’s loripsum.net server. See details below.

One advantage of Secure Gateway approach is there are no Streams application changes.
There is one aspect that is both an advantage and disadvantage relative to the other approaches. It requires specific code to be installed on the clients. This provides a level of security since you can restrict who can connect to Streams operators, but is not suitable for demos that intended to use the INET REST operators to serve data to the general public.
Similar to the Proxy Servlet solution, The Secure Gateway approach does require the IP address and port where the Streams operator is running to be configured in the Secure Gateway and this configuration would need to be modified every time the IP address where the Streams operator is running changes.

The video below shows the end to end flow of creating and configuring a Secure Gateway to connect to a Streams application.

One of the key steps is locating the streams IP address used in the Secure Gateway configuration.
You can find the IP address the operator is running on in the streams console.

Operator Hover

If you hover over the operators in the Streams Console’s Streams Graph you can locate the operator you are interested in: Operator com.ibm.streams.inet.rest::HTTPTupleView. Note the port it was configured to listen on serverPort: 8080. Also note the PE this operator is running in PE: 0.

In the Streams Console’s Streams Tree you can locate this PE to find the IP address of the application resource it is currently running on.

Processing Element Hover

In this case it is running on Resource: 10.121.196.72.

This information needs to be specified in your Secure Gateway Destination Resource Hostname and Resource Port fields.

Secure gateway Destination Dialog

Summary

Streams has many source and sink adapters suitable for getting data into and out of streams applications. When running streams applications in the Bluemix Streaming Analytics Service thought must be given to the security environment in place to successfully support the data interactions.

More How-tos Stories

IBM Data Catalog Now Generally Available

We hope you have been having a great experience discovering, cataloguing and governing data with IBM Data Catalog as part of IBM Watson Data Platform. We’d like to inform you that the Data Catalog service is now generally available (GA), and all Beta plan instances will be retired on January 31, 2018.

Continue reading

Convincing the naysayers: proving the business value of streaming analytics technology

There’s a lot of hype around the possibilities of stream computing. It seems like everywhere you look, more and more organizations are touting the benefits of capturing and analyzing large volumes of data at high velocity—and increasing numbers of streaming analytics solutions, both commercial and open source, are flooding the market.

Continue reading

The clock is ticking: catch perishable insights and act on them before time runs out

Change doesn’t stop, so neither should your analytics. You could capture the most crucial, valuable insight of all—but if you don’t identify and act on it while it’s still valid, or before your competitors do, it’s worth nothing. Imagine you’re an electronics company that has sunk thousands of hours and millions of dollars into building a profile of the perfect customer for a new product release. Before you can claw back your investment with a wildly successful launch, a rival comes along and disrupts the entire industry with an innovative device like no one has ever seen before. All that effort and resources expended… all for nothing.

Continue reading