IBM InfoSphere Streams provides a highly scalable platform for analyzing structured and unstructured data while it is in motion. InfoSphere Streams provides an intuitive and extensible development environment for creating, compiling, and deploying streaming applications.
Streaming applications are composed of streams (reliable, ordered, one-way message flows), operators (configurable functions that filter, aggregate, enrich, or transform the messages in streams) and adapters (specialized operators that continuously ingest data and output analysis results).
InfoSphere Streams provides a rich set of general-purpose operators, plus containers for reusing existing C/C++ and Java® code as streaming operators. InfoSphere Streams can also be extended with toolkits of domain-specific operators.
Streaming applications are declared as a data flow graph with the Stream Processing Language. The flow graph specifies the data types the application's streams will carry, which adapters and operators will process the data as it flows through the application, and how the operators will be interconnected by streams. Figure 1 illustrates the data flow graph for a streaming application.
Figure 1. Streaming application flow graph
Large streaming applications can span more than a hundred Linux server machines. When developing applications for InfoSphere Streams, you may find it more convenient to install it onto a virtual machine. Installing onto a virtual machine enables you to design and test streaming applications from your regular laptop or workstation computer.
This tutorial guides you through a step-by-step procedure for creating a self-contained InfoSphere Streams development environment on a virtual machine. To accomplish this, you install and configure these four software products:
- VMware provides a virtual machine capability for Microsoft Windows and Apple Mac computers. (Refer to http://www.vmware.com/products/.)
- Red Hat Enterprise Server provides the operating system for IBM InfoSphere Streams. (Refer to https://www.redhat.com/rhel/server/.)
- IBM InfoSphere Streams provides a streaming runtime and application development tools. (Refer to http://www.ibm.com/software/data/infosphere/streams/.)
- Eclipse provides the integrated application development platform for the InfoSphere Streams Studio tools. (Refer to http://www.eclipse.org/.)
This tutorial outlines the specific installation steps you need to take with each product and suggests specific values for many configuration steps. However, you should refer to the official documentation for each product for details, options, and clarification. Refer to the Resources section of this tutorial for links to the products' documentation.
Following are the main tasks covered by the tutorial:
- Obtain product distribution packages
- Install VMware
- Install and configure Red Hat Enterprise Linux
- Install IBM InfoSphere Streams
- Install Eclipse and InfoSphere Streams Studio
- Verify the install
Many of the steps depend on previous steps, so you should execute all the steps in the order in which they are presented.