IBM Streams 4.2

IBM Streams features and architecture

IBM® Streams consists of a programming language, an API, and an integrated development environment (IDE) for applications, and a runtime system that can run the applications on a single or distributed set of resources. The Streams Studio IDE includes tools for authoring and creating visual representations of streams processing applications.

IBM Streams is designed to address the following data processing platform objectives:
  • Parallel and high performance streams processing software platform that can scale over a range of hardware environments
  • Automated deployment of streams processing applications on configured hardware
  • Incremental deployment without restarting to extend streams processing applications
  • Secure and auditable run time environment

The IBM Streams architecture represents a significant change in computing system organization and capability. IBM Streams provides a runtime platform, programming model, and tools for applications that are required to process continuous data streams. The need for such applications arises in environments where information from one to many data streams can be used to alert humans or other systems, or to populate knowledge bases for later queries.

IBM Streams is especially powerful in environments where traditional batch or transactional systems might not be sufficient, for example:
  • The environment must process many data streams at high rates.
  • Complex processing of the data streams is required.
  • Low latency is needed when processing the data streams.

IBM Streams offers the IBM Streams Processing Language (SPL) interface for users to operate on data streams. SPL provides a language and runtime framework to support streams processing applications. Users can create applications without needing to understand the lower-level stream-specific operations. SPL provides numerous operators, the ability to import data from outside IBM Streams and export results outside the system, and a facility to extend the underlying system with user-defined operators. Many of the SPL built-in operators provide powerful relational functions such as Join and Aggregate.

Starting with IBM Streams Version 4.1, users can also develop streams processing applications in other supported languages, such as Java™ or Scala. The Java Application API (Topology Toolkit) supports creating streaming applications for IBM Streams in these programming languages.

Deploying streams processing applications results in the creation of a dataflow graph, which runs across the distributed run time environment. As new workloads are submitted, IBM Streams determines where to best deploy the operators to meet the resource requirements of both newly submitted and already running specifications. IBM Streams continuously monitors the state and utilization of its computing resources. When streams processing applications are running, they can be dynamically monitored across a distributed collection of resources by using the Streams Console, Streams Studio, and streamtool commands.

Results from the running applications can be made available to applications that are running external to IBM Streams by using Sink operators or edge adapters. For example, an application might use a TCPSink operator to send its results to an external application that visualizes the results on a map. Alternatively, it might alert an administrator to unusual or interesting events. IBM Streams also provides many edge adapters that can connect to external data sources for consuming or storing data.