IBM Streams features and architecture

IBM® Streams consists of a programming language, an API, and a runtime system that can run the applications on a single or distributed set of resources.

The Streams architecture represents a significant change in computing system organization and capability. Streams provides a runtime platform, programming model, and tools for applications that are required to process continuous data streams. The need for such applications arises in environments where information from one to many data streams can be used to alert humans or other systems, or to populate knowledge bases for later queries.

Streams is designed to address the following data processing platform objectives:

Parallel and high performance streams processing software platform that can scale over a range of hardware environments
Automated deployment of stream processing applications on configured hardware
Incremental deployment without restarting to extend stream processing applications
Secure and auditable runtime environment

Streams is especially powerful in environments where traditional batch or transactional systems might not be sufficient, for example:

The environment must process many data streams at high rates.
Complex processing of the data streams is required.
Low latency is needed when processing the data streams.

Cloud-based architecture

Streams is a cloud-native application runtime environment. You can install and deploy Streams on Red Hat OpenShift or Kubernetes environments by using Kubernetes operators and Docker images. You can also install and deploy Streams as a service for IBM Cloud Pak for Data environments.

OpenShift is an enterprise-ready Kubernetes container platform with full-stack automated operations to manage hybrid cloud and multicloud deployments.

OpenShift has two main components: a container manager (Docker) and a container orchestrator (Kubernetes). Other components of an OpenShift cluster work alongside these main components to provide services such as authentication, storage, networking, logging, and monitoring. You can also use the OpenShift management console, which serves as a centralized management location for these services. For more information about OpenShift, see the OpenShift web pages and product documentation.

Streams is deployed into an OpenShift environment as a Kubernetes operator. This deployment has the following implications:

When you create a custom resource of a Streams instance, a Streams runtime environment is created and started by the Kubernetes operator.
The name of the Streams instance is the name of the custom resource object.
The Streams instance is composed of a set of management services interacting across one or more resources.
You can deploy multiple Streams instances within the same OpenShift project.
Application resources are not started until you submit a job to the instance.

For more information about the Streams instance Kubernetes operators and Docker images, see Deploying the IBM Streams instance operator.

Developing and deploying applications

After you install Streams, you get your own instance of Streams running on a cluster, ready to run your Streams applications. Streams doesn't include a Streams development environment, but you can deploy the applications that you develop locally by using IBM Streams Version 4.3. In addition, you can install and deploy a Streams build service. With the build service, you can compile your Streams application on the same cluster where the application will run.

When you use Streams, you are responsible for writing the applications that will run in the Streams instance and for ensuring that these applications function correctly and meet performance requirements. You are also responsible for any application-specific monitoring. You can get started right away with Streams by running a starter application. However, to develop more complex Streams applications, you must use an on-premises Streams installation. Applications that you develop and compile locally can then be seamlessly deployed as a bundle to a Streams instance. You can use one of the following methods to submit the application bundle (.sab file) that is associated with your Streams application:

Use the Submit Job button in the IBM Streams Console.
Submit and control your application by using the IBM Streams REST API.

Some of the features in previous versions of Streams are no longer supported. For a detailed list of features that are not supported, see Considerations for IBM Streams Version 5.2.0.

Deploying stream processing applications results in the creation of a dataflow graph, which runs across the distributed runtime environment. As new workloads are submitted, Streams determines where to best deploy the operators to meet the resource requirements of both newly submitted and already running specifications. Streams continuously monitors the state and utilization of its computing resources. When stream processing applications are running, they can be dynamically monitored across a distributed collection of resources by using Streams Console, streamtool commands, and REST APIs.

Results from the running applications can be made available to applications that are running external to Streams by using Sink operators or edge adapters. For example, an application might use a TCPSink operator to send its results to an external application that visualizes the results on a map. Alternatively, it might alert an administrator to unusual or interesting events. Streams also provides many edge adapters that can connect to external data sources for consuming or storing data.

OpenShift, Kubernetes, operators, and Docker in Streams

For the best experience in using Streams, you must understand how OpenShift, Kubernetes, Kubernetes operators, and Docker work. You can learn more about these components in the following documentation:

Get started in the OpenShift documentation
Kubernetes operators in the OpenShift documentation
Kubernetes Basics in the Kubernetes documentation
Get Started, Part 1: Orientation and setup in the Docker documentation