Before you start
IBM InfoSphere Streams is a high-performance computing system designed for streaming applications that span many different kinds of business, scientific, engineering, and environmental disciplines. InfoSphere Streams is a highly scalable and powerful analytics platform that can be used to rapidly analyze and solve many real-time problems, in real time, using a wide variety of data streams from thousands of live data sources, as well as reference data from data warehouses. InfoSphere Streams provides an application development and execution environment that is tailored to users developing specific applications that ingest, filter, analyze, correlate, classify, transform, and otherwise process large volumes of continuous data streams. These powerful applications result in faster and more intelligent decision-making outcomes within diverse business sectors such as healthcare, transportation, financial, manufacturing, communications, energy, and security.
Version 2.0 of the IBM InfoSphere Streams product introduces a variety of new features and enhancements to the previous 1.x versions, the most significant of which is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). The primary reasons for this change are to make the language simpler, more intuitive, and more uniform. The most visible change with SPL is the syntax, which has a structure and feel that is more in the realm of C or Java™, thus making it easier for programmers to read, understand, and be more productive when writing SPL applications.
Because the programming language model was transformed from SPADE to
SPL in InfoSphere Streams Version 2.0, users with SPADE applications
from the previous versions need to migrate and port their applications
to SPL when upgrading their installation. Initial information to help
users do this is made available as part of the shipped product, which
is provided primarily in the IBM Streams Processing Language SPADE
Application Migration and Interoperability Guide. This guide
is located in the
doc subdirectory below
the directory where InfoSphere Streams is installed or in the IBM
InfoSphere Streams Information Center (see Resources).
The IBM Streams Processing Language SPADE Application Migration and Interoperability Guide contains basic information and instructions for migrating SPADE applications, but the information is presented at a fairly high level without specific porting examples. This tutorial complements the high-level guide by providing examples of SPADE-to-SPL application migration. You will use a SPADE sample application that is shipped as part of a SPADE installation as the basis for the migration example in this tutorial. However, it is recommended that you understand the overall migration procedures contained in the high-level guide before tackling the more detailed step-by-step procedures outlined in this tutorial.
Finally, you should have at least a fundamental knowledge of, and
experience with, the SPADE programming language prior to proceeding
with this tutorial. You should also have access to a version of
InfoSphere Streams prior to Version 2.0 (generally referred to as
Version 1.2.x herein, because that is the version used to obtain and
work with the sample examples) as well as a Version 2.0 installation
in order to perform the exercises in this tutorial. You also need at
least a basic understanding of SPL, which you can acquire by reading
the IBM Streams Processing Language Introductory Tutorial and
trying out the various examples given there. The SPL introductory
tutorial, as well as several other documents helpful in learning about
more advanced aspects of SPL, operator models, operator toolkits, and
so on, can be found in the
below the directory where InfoSphere Streams is installed or in the
IBM InfoSphere Streams Information Center (see Resources).
Before proceeding with any of the migration and porting examples, there are some general prerequisites that must be satisfied. These prerequisites should be set up as follows:
- Ensure an InfoSphere Streams Version 1.2.x installation is
available with the original (unaltered) sample code located under
the installation directory within the
- Ensure an InfoSphere Streams Version 2.0 installation is available.
- Ensure availability of at least one terminal access point connection for each installation version with the appropriate environment set up to work with each version.
- Create a base working directory, located off your home directory,
in which to place and work with the migration samples. The
suggested name for this directory, which is used throughout this
migrationsamples. If not already created, issue the following commands to create this directory:
cd ~ mkdir migrationsamples
There are a number of common items that pertain to each of the migration and porting examples discussed in this tutorial. Some of these items are environmental, while others are suggested methods and techniques for addressing particular steps of the porting processes.
Note that both InfoSphere Streams Version 1.2.x and Version 2.0 are used in the examples throughout this tutorial. When working with the various elements of SPADE applications, the use of Version 1.2.x is required, whereas when working with the various aspects of migration procedures and SPL applications, the use of Version 2.0 is required. The spade-to-spl command, which is the primary command used to translate SPADE code to SPL, is available in Version 2.0 installations. Thus, when you reach the point in the migration process where the spade-to-spl command is issued, you will switch from using a SPADE installation to using an SPL installation.
One of the main steps of the migration process is the execution of the spade-to-spl translation command. After this translation, a list of errors, warnings, and informational messages could result. The items in this list can be addressed in any order, but it is generally easier to resolve the simpler issues first because they often don't require any explicit changes, although in some cases changes might be made based on user preference. You can continue to evaluate the issues and work on resolving them in order of increasing difficulty. This is the philosophy and methodology used in this tutorial, which means particular translation issues might be presented out of numerical order from that reported by the translation process.
Throughout this tutorial, the approach for the compilation of applications is different for SPADE and SPL. For both types of applications, either the distributed or standalone methods of compilation can be used. The approaches used in this tutorial were selected to reduce the overall number of steps and commands that would have to be executed to demonstrate certain aspects of the migration processes.
For the SPADE application samples, using the makefile process to compile and generate easy-to-use execution scripts makes running and validating the samples relatively straightforward, even though they use the distributed application environment. For the SPL applications being migrated from the SPADE application samples (and in general when initially developing any new SPL applications), it is usually quicker and easier to initially employ the standalone application method of compilation. The distributed method is best for final production tuning when the application workload needs to be distributed across multiple cluster nodes to produce the best performance characteristics. Therefore, to keep the content of this tutorial as simple as possible, the standalone approach is used for SPL applications.