October 26, 2017 By Oriana Zambrano 2 min read

Streaming Analytics Updates: IBM Streams Runner for Apache Beam

The IBM Streaming Analytics service is a cloud-based service for IBM Streams. Streams is an analytics platform that allows you to create applications that analyze data from a variety of sources in real time. Streaming Analytics continues to add enhancements to make it easy for you to create streaming applications however you choose. Previously, we announced integration with DSX to allow creating Streams applications in Python. Now, you can run a Beam application/pipeline in Streaming Analytics.

Imagine you are given the task to write an application for a website. The application needs to look at online users and their activity to identify popular content. You’ll need to look at logs, user clickstreams, and existing user data stored in a database. Which platform are you going to use to write this application: Apache Spark, Apache Flink, IBM Streams? Why not write the app with a single interface and choose where you run it later?

This is the goal of Apache Beam, a unified programming model for data processing—batch or streaming. Similar to Streams, Beam allows users to develop data processing applications using a set of functions to manipulate your data. Beam, however, simply provides a programming model, and leaves it up to you to select a runtime platform via a runner when you launch your application.

We’ve added the IBM Streams Runner for Apache Beam to the Streaming Analytics service so that you can run your Beam application on the Streams platform.

Beam on the Industry-leading IBM Streams Platform

IBM Streams offers a continuous, complete, and connected solution. If you use IBM Streams as your Beam runner, you’ll get a fast, stable, industry-leading platform. In addition, since the Streams runner can run in the cloud, you can develop Beam applications locally using the direct runner and then later deploy the applications to the Bluemix cloud.

No Streams Installation Required — The Streams runner allows you to directly send your applications to the Streaming Analytics service to be compiled and executed. This means there’s no need to install Streams on your system.

Interact with Beam pipelines with the newly updated Streams Console — Beam applications appear just like they are laid out in your source code. Additionally, you can view all custom metrics, console logs, data stream flow rates, and even congested streams.

Download today — The Streams Runner is now available to download through your existing Streaming Analytics service. Don’t have an existing service? Create one here.

IBM Streams Runner for Apache Beam Features

  • Support for Beam 2.0 Java SDK

  • Support primitive and custom composite Beam transforms

  • Support for custom Beam metrics

    • Counter, Distribution, and Gauge types

    • Watermark metrics are automatically created for you

  • Support for processing-time and event-time timers and window triggers

  • Support for stateful processing

  • Support for custom parameters specified at application runtime

  • Integration into the Streams Platform

    • Submit Beam applications to a Streaming Analytics service with no local Streams installation required

    • Specify local data files to be available for your application in the Streaming Analytics service

    • Support to cancel Streams job from the Beam application

    • View Beam Pipeline layouts in the Streams Graph

  • Specialized Beam SDK for Streams

    • Publish data streams for other Streams applications to utilize or subscribe to data streams for your application to consume

    • Read/write files to an IBM Object Storage OpenStack Swift for Bluemix service

Learn More

More from

IBM Cloud Reference Architectures unleashed

2 min read - The ability to onboard workloads to cloud quickly and seamlessly is paramount to accelerate enterprises digital transformation journey. At IBM Cloud, we're thrilled to introduce the IBM Cloud® Reference Architectures designed to empower clients, technical architects, strategists and partners to revolutionize the way businesses harness the power of the cloud. VPC resiliency: Strengthening your foundation Explore the resilience of IBM Cloud Virtual Private Cloud through our comprehensive resources. Dive into our VPC Resiliency white paper, a blueprint for building robust…

Empower developers to focus on innovation with IBM watsonx

3 min read - In the realm of software development, efficiency and innovation are of paramount importance. As businesses strive to deliver cutting-edge solutions at an unprecedented pace, generative AI is poised to transform every stage of the software development lifecycle (SDLC). A McKinsey study shows that software developers can complete coding tasks up to twice as fast with generative AI. From use case creation to test script generation, generative AI offers a streamlined approach that accelerates development, while maintaining quality. This ground-breaking technology…

Data protection strategy: Key components and best practices

8 min read - Virtually every organization recognizes the power of data to enhance customer and employee experiences and drive better business decisions. Yet, as data becomes more valuable, it's also becoming harder to protect. Companies continue to create more attack surfaces with hybrid models, scattering critical data across cloud, third-party and on-premises locations, while threat actors constantly devise new and creative ways to exploit vulnerabilities. In response, many organizations are focusing more on data protection, only to find a lack of formal guidelines and…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters