Streaming Analytics Updates: IBM Streams Runner for Apache Beam
The IBM Streaming Analytics service is a cloud-based service for IBM Streams. Streams is an analytics platform that allows you to create applications that analyze data from a variety of sources in real time. Streaming Analytics continues to add enhancements to make it easy for you to create streaming applications however you choose. Previously, we announced integration with DSX to allow creating Streams applications in Python. Now, you can run a Beam application/pipeline in Streaming Analytics.
Imagine you are given the task to write an application for a website. The application needs to look at online users and their activity to identify popular content. You’ll need to look at logs, user clickstreams, and existing user data stored in a database. Which platform are you going to use to write this application: Apache Spark, Apache Flink, IBM Streams? Why not write the app with a single interface and choose where you run it later?
This is the goal of Apache Beam, a unified programming model for data processing—batch or streaming. Similar to Streams, Beam allows users to develop data processing applications using a set of functions to manipulate your data. Beam, however, simply provides a programming model, and leaves it up to you to select a runtime platform via a runner when you launch your application.
We’ve added the IBM Streams Runner for Apache Beam to the Streaming Analytics service so that you can run your Beam application on the Streams platform.
Beam on the Industry-leading IBM Streams Platform
IBM Streams offers a continuous, complete, and connected solution. If you use IBM Streams as your Beam runner, you’ll get a fast, stable, industry-leading platform. In addition, since the Streams runner can run in the cloud, you can develop Beam applications locally using the direct runner and then later deploy the applications to the Bluemix cloud.
No Streams Installation Required — The Streams runner allows you to directly send your applications to the Streaming Analytics service to be compiled and executed. This means there’s no need to install Streams on your system.
Interact with Beam pipelines with the newly updated Streams Console — Beam applications appear just like they are laid out in your source code. Additionally, you can view all custom metrics, console logs, data stream flow rates, and even congested streams.
Download today — The Streams Runner is now available to download through your existing Streaming Analytics service. Don’t have an existing service? Create one here.
IBM Streams Runner for Apache Beam Features
-
Support for Beam 2.0 Java SDK
-
Support primitive and custom composite Beam transforms
-
Support for custom Beam metrics
-
Counter, Distribution, and Gauge types
-
Watermark metrics are automatically created for you
-
-
Support for processing-time and event-time timers and window triggers
-
Support for stateful processing
-
Support for custom parameters specified at application runtime
-
Integration into the Streams Platform
-
Submit Beam applications to a Streaming Analytics service with no local Streams installation required
-
Specify local data files to be available for your application in the Streaming Analytics service
-
Support to cancel Streams job from the Beam application
-
View Beam Pipeline layouts in the Streams Graph
-
-
Specialized Beam SDK for Streams
-
Publish data streams for other Streams applications to utilize or subscribe to data streams for your application to consume
-
Read/write files to an IBM Object Storage OpenStack Swift for Bluemix service
-