Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 5: Migrate SPADE user-defined built-in operator (UBOP) applications

An introductory guide by example

The most significant new feature of Version 2.0 of the IBM InfoSphere® Streams product is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). Users with SPADE applications from previous versions will need to migrate and port their applications to SPL when upgrading their installations to Version 2.0. This tutorial is Part 5 of a 5-part series that uses actual SPADE samples to demonstrate a series of step-by-step procedures for migrating and porting different types of SPADE application content. Part 5 demonstrates the migration of SPADE user-defined built-in operator (UBOP) applications.

Kevin Erickson (kjerick@us.ibm.com), Senior Software Engineer, I.B.M.

Photo of Kevin EricksonKevin Erickson joined IBM in Rochester, MN, in 1981 as an electrical engineer working on development and test of hard disk drives. He has contributed to a wide range of IBM products and technologies, from disk drive actuator and spindle servo control systems, to AS/400 and successor operating system kernel components. More recently Kevin has worked on performance testing of the Roadrunner supercomputer, the first computer to break the 1 petaflop barrier. In 2009, he transferred to Software Group to work with the InfoSphere Streams team, primarily in a variety of test-related roles. Over his IBM career, Kevin has been issued 14 U.S. patents, plus several in other countries. He holds a B.S. degree in Electrical Engineering from the University of Minnesota.



Richard P. King (rpk@us.ibm.com), Senior Programmer, IBM

Richarc King's photoRichard King has been a part of the InfoSphere Streams team for a number of years. When he joined IBM in 1977, it was as an associate programmer working on the development of System/38. Since transferring to IBM's Research Division at IBM Thomas J. Watson Research Center in Yorktown in 1981, he has worked on a wide variety of projects, including the design and development of what became the IBM Sysplex Coupling Facility. He has a bacholor's degree in industrial engineering and operations research from Cornell University and a master's in operations research and industrial engineering from Northwestern University.



16 June 2011

Before you start

Introduction

More in the series

Look for more in the Migrating InfoSphere Streams SPADE applications to SPL series:

Each part is self-contained and does not rely on the previous example. Therefore, you can learn how to migrate these types of SPADE applications in any order. However, the examples are arranged in what might be considered easiest to most difficult, so completing each part in order might be beneficial.

The "Before you start" section is repeated in each tutorial so that you can complete each part of the series independently.

IBM InfoSphere Streams is a high-performance computing system designed for streaming applications that span many different kinds of business, scientific, engineering, and environmental disciplines. InfoSphere Streams is a highly scalable and powerful analytics platform that can be used to rapidly analyze and solve many real-time problems, in real time, using a wide variety of data streams from thousands of live data sources, as well as reference data from data warehouses. InfoSphere Streams provides an application development and execution environment that is tailored to users developing specific applications that ingest, filter, analyze, correlate, classify, transform, and otherwise process large volumes of continuous data streams. These powerful applications result in faster and more intelligent decision-making outcomes within diverse business sectors such as healthcare, transportation, financial, manufacturing, communications, energy, and security.

Version 2.0 of the IBM InfoSphere Streams product introduces a variety of new features and enhancements to the previous 1.x versions, the most significant of which is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). The primary reasons for this change are to make the language simpler, more intuitive, and more uniform. The most visible change with SPL is the syntax, which has a structure and feel that is more in the realm of C or Java™, thus making it easier for programmers to read, understand, and be more productive when writing SPL applications.

Because the programming language model was transformed from SPADE to SPL in InfoSphere Streams Version 2.0, users with SPADE applications from the previous versions need to migrate and port their applications to SPL when upgrading their installation. Initial information to help users do this is made available as part of the shipped product, which is provided primarily in the IBM Streams Processing Language SPADE Application Migration and Interoperability Guide. This guide is located in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

The IBM Streams Processing Language SPADE Application Migration and Interoperability Guide contains basic information and instructions for migrating SPADE applications, but the information is presented at a fairly high level without specific porting examples. This tutorial complements the high-level guide by providing examples of SPADE-to-SPL application migration. You will use a SPADE sample application that is shipped as part of a SPADE installation as the basis for the migration example in this tutorial. However, it is recommended that you understand the overall migration procedures contained in the high-level guide before tackling the more detailed step-by-step procedures outlined in this tutorial.

Finally, you should have at least a fundamental knowledge of, and experience with, the SPADE programming language prior to proceeding with this tutorial. You should also have access to a version of InfoSphere Streams prior to Version 2.0 (generally referred to as Version 1.2.x herein, because that is the version used to obtain and work with the sample examples) as well as a Version 2.0 installation in order to perform the exercises in this tutorial. You also need at least a basic understanding of SPL, which you can acquire by reading the IBM Streams Processing Language Introductory Tutorial and trying out the various examples given there. The SPL introductory tutorial, as well as several other documents helpful in learning about more advanced aspects of SPL, operator models, operator toolkits, and so on, can be found in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

Setting up

Before proceeding with any of the migration and porting examples, there are some general prerequisites that must be satisfied. These prerequisites should be set up as follows:

  1. Ensure an InfoSphere Streams Version 1.2.x installation is available with the original (unaltered) sample code located under the installation directory within the samples subdirectory.
  2. Ensure an InfoSphere Streams Version 2.0 installation is available.
  3. Ensure availability of at least one terminal access point connection for each installation version with the appropriate environment set up to work with each version.
  4. Create a base working directory, located off your home directory, in which to place and work with the migration samples. The suggested name for this directory, which is used throughout this example, is migrationsamples. If not already created, issue the following commands to create this directory:
    cd ~
    mkdir migrationsamples

Understanding general concepts for the examples

There are a number of common items that pertain to each of the migration and porting examples discussed in this tutorial. Some of these items are environmental, while others are suggested methods and techniques for addressing particular steps of the porting processes.

Dual installation

Note that both InfoSphere Streams Version 1.2.x and Version 2.0 are used in the examples throughout this tutorial. When working with the various elements of SPADE applications, the use of Version 1.2.x is required, whereas when working with the various aspects of migration procedures and SPL applications, the use of Version 2.0 is required. The spade-to-spl command, which is the primary command used to translate SPADE code to SPL, is available in Version 2.0 installations. Thus, when you reach the point in the migration process where the spade-to-spl command is issued, you will switch from using a SPADE installation to using an SPL installation.

Order of post-migration fix-ups and porting

One of the main steps of the migration process is the execution of the spade-to-spl translation command. After this translation, a list of errors, warnings, and informational messages could result. The items in this list can be addressed in any order, but it is generally easier to resolve the simpler issues first because they often don't require any explicit changes, although in some cases changes might be made based on user preference. You can continue to evaluate the issues and work on resolving them in order of increasing difficulty. This is the philosophy and methodology used in this tutorial, which means particular translation issues might be presented out of numerical order from that reported by the translation process.

Compilation choices for SPADE and SPL applications

Throughout this tutorial, the approach for the compilation of applications is different for SPADE and SPL. For both types of applications, either the distributed or standalone methods of compilation can be used. The approaches used in this tutorial were selected to reduce the overall number of steps and commands that would have to be executed to demonstrate certain aspects of the migration processes.

For the SPADE application samples, using the makefile process to compile and generate easy-to-use execution scripts makes running and validating the samples relatively straightforward, even though they use the distributed application environment. For the SPL applications being migrated from the SPADE application samples (and in general when initially developing any new SPL applications), it is usually quicker and easier to initially employ the standalone application method of compilation. The distributed method is best for final production tuning when the application workload needs to be distributed across multiple cluster nodes to produce the best performance characteristics. Therefore, to keep the content of this tutorial as simple as possible, the standalone approach is used for SPL applications.


Creating and setting up a working UBOP SPADE application

This section describes the process for migrating and porting a SPADE application containing user-defined built-in operator (UBOP) capabilities. In SPL, UBOPs are called generic operators. For this tutorial, the SPADE sumubop application (which is not provided as a sample application shipped with Version 1.2.x of InfoSphere Streams, but is instead a simple application created specifically for this tutorial) is used to demonstrate the step-by-step migration and porting procedures.

Because the SPADE application used for this tutorial is not a shipped sample, you must create and set it up in the tutorial directory structure by hand. The instructions in this section are used to describe the contents of each SPADE application file and where they exist in the directory structure.

All of the SPADE application files will reside under a main directory and subdirectories, which you must create.

  1. Change directories to the base working directory.
    cd ~/migrationsamples
  2. Make the main directory for the SPADE UBOP sample.
    mkdir sum_ubop
  3. Make the code generation template related subdirectory for the SPADE UBOP sample. Note that there is a particular directory path structure for this directory, which is discussed in the “User-Defined Built-in Operators” chapter of the Programming Model and Language Reference shipped with Version 1.2.x of InfoSphere Streams. This reference is in a file named IBMInfoSphereStreams-LangRef.pdf and is located in the <V1.2.x_Installation>/doc directory, where <V1.2.x_Installation> indicates the base file system location path name for the installation.
    cd sum_ubop
    mkdir -p opt/com.ibm.spade.Summation/generic

Defining the main application sumubop.dps

The main application file resides under the main directory for the SPADE UBOP sample. Complete the following steps to create the file there.

  1. Open an edit session for the main application file, and give it the name sumubop.dps.
    my_editor sumubop.dps &
  2. Copy the application code shown in Listing 1 into this file, save the file, and exit the edit session.

    Notice the usage of the Summation operator in the program. This is not a SPADE built-in operator, but it is instead the UBOP that is being used to perform a particular function: the addition of two numbers of any type. The definition and implementation of the Summation operator consists of a bit of design work, including some code generation templates and a restriction model, as shown in the following sections.

    Listing 1. Application code (sumubop.dps)
    sumubop debug
    
    [Program]
    stream SrcInt(numOne : Integer, numTwo: Integer)
      := Source()["file:///AddInt.dat", csvFormat, noDelays]{}
    
    stream SrcFloat(numOne : Float, numTwo: Float)
      := Source()["file:///AddFloat.dat", csvFormat, noDelays]{}
    
    
    stream AddInt(sum : Integer)
      := Summation(SrcInt)[]{}
    
    stream AddFloat(sum : Float)
      := Summation(SrcFloat)[]{}
    
    
    Nil
      := Sink(AddInt)["file:///SummedInt.dat", csvFormat, noDelays]{}
    
    Nil
      := Sink(AddFloat)["file:///SummedFloat.dat", csvFormat, noDelays]{}

Creating an application make file

The application make file resides under the main directory for the SPADE UBOP sample. Complete the following steps to create the make file there.

  1. Open an edit session for the application make file and name it Makefile.
    my_editor Makefile &
  2. Copy the application Makefile code shown in Listing 2 into this file, then save the file and exit the edit session.
    Listing 2. Code for Makefile
    DPSFLAGS = -a -l -n -z
    DPS = $(STREAMS_INSTALL)/bin/spadec -t opt
    
    .PHONY: clean distclean force
    
    CMD_LINE_ARGS = 
    APP = sumubop
    
    all: start_streams_$(APP).sh
    
    start_streams_$(APP).sh: $(APP).dps force
    	$(DPS) $(DPSFLAGS) -f $< $(CMD_LINE_ARGS)
    
    clean: $(APP).dps
    	$(DPS) -C -f $^
    
    distclean: $(APP).dps
    	$(DPS) -R -f $^

Creating code generation templates

Part of the process of developing a new UBOP is to create code generation templates: one to generate interface definitions in a header file and one to generate implementation code. A special SPADE script called spademkop.pl can be used to create the code generation templates and initially populate them with skeleton code for starting the design. Complete the following steps to create the code generation templates.

  1. Change directories to the code generation template subdirectory.
    cd opt/com.ibm.spade.Summation/generic
  2. Run the script to create the code generation templates for the operator you are developing.
    spademkop.pl
    [Answer ‘y’ when asked to create <op-name>_h.cgt]
    [Answer ‘y’ when asked to create <op-name>_cpp.cgt]

The spademkop.pl script actually creates four files: two code generation templates, which are named Summation_h.cgt and Summation_cpp.cgt, and two Perl module backend code generators, which are named Summation_h.pm and Summation_cpp.pm, which are generated from the templates. All these files are given the base name Summation because of the namespace in the directory path following com.ibm.spade, which is Summation to match the name of the UBOP used in the main application file.

Of the four files created, only the two code generation templates potentially need to be updated with interface definitions and functional code implementations. The Perl module backend code generators are produced as part of the compilation process and are not to be manually modified in any way. The process of updating the code generation templates is discussed in the next subsections.

Operator header file code generation template - Summation_h.cgt

For this relatively non-complex UBOP example, there is no need to change the header file code generation template. The skeleton code that was populated into this template will work as it is, as shown in Listing 3.

Listing 3. Contents of header file code generation template (Summation_h.cgt)
<% my $opername = $context->getOperator()->getName(); %>

#ifndef <%=uc($opername."_H_")%>
#define <%=uc($opername."_H_")%>

#include <DPS/PH/DpsOP.h>
#include <DPS/PH/DpsSignal.h>

<% CodeGen::createInterfaceIncludes($context);%>

namespace DPS {

class <%=$opername%> : public DpsOP {

public:
  <%=$opername%> (DpsPE & ppe, DpsOPContext & cx);
  virtual ~<%=$opername%> (){}

// For operators with input ports
<% for(my $i=0; $i<$context->getNumberOfInputPorts(); ++$i) { %>
  void process<%=$i%>(const <%=$context->getInputStreamAt($i)->getType()%> & tuple);
<% }%>

// For source operators with no ports
// void process();

  void processPunctuation(const Punctuation::Value &lt; value, 
                          unsigned int input);

  void allPortsReady();
  void prepareToTerminate();  
  
public:
  <% CodeGen::createSubmitSignals($context); %>
  <% CodeGen::createPunctuateSignals($context); %>
  <% CodeGen::createRuntimeConstants($context); %>
};

};

#endif /* <%=uc($opername."_H_")%> */

Operator implementation code generation template - Summation_cpp.cgt

For this relatively non-complex UBOP example, there is primarily only one segment of functional code that needs to be added to the implementation code generation template, which is the code to perform the addition of two numbers of any numeric type. There also might be some sections in the skeleton code that were pre-populated by the creation script, which might not be needed and can be deleted. Complete the following necessary and optional changes.

  1. Edit the application implementation code generation template.
    my_editor Summation_cpp.cgt &
  2. In the // For operators with input ports section, comment out the line of code in bold:
    // rename input tuple so that output assignment expressions are valid
    const <%=$istream->getType()%> & <%=$istream->getCppName()%> = tuple;

    This template variable is not used and might result in a warning, or it might result in a failure if warnings are treated as errors.
  3. In the // For operators with input ports section, add the custom code for this operator.
    <%=$context->getOutputStreamAt(0)->getType()%> otuple;
    otuple._<%=$context->getOutputStreamAt(0)->getAttributeAt(0)->getName()%> 
      = tuple._<%=$context->getInputStreamAt($i)->getAttributeAt(0)->getName()%>
       + tuple._<%=$context->getInputStreamAt($i)->getAttributeAt(1)->getName()%>;
    submit0(otuple);
  4. Save and exit.

The result of the implementation code generation template after all the indicated changes are made is shown in Listing 4.

Listing 4. Contents of implementation code generation template (Summation_cpp.cgt)
#include <DPS/PH/DpsPE.h>
#include <DPS/PH/DpsFunctions.h>

<% CodeGen::createImplementationIncludes($context); %>

using namespace std;
using namespace DPS;
UTILS_NAMESPACE_USE;
// for hash_map and hash_set
using namespace estd; 

<%=$opername%>::<%=$opername%>(DpsPE & ppe, DpsOPContext & cx)
  : DpsOP(ppe,cx)
  <% CodeGen::initializeSubmitSignals($context); %>
  <% CodeGen::initializePunctuateSignals($context); %>
{ <% CodeGen::initializeRuntimeConstants($context); %>
  // custom code here 
}

// For operators with input ports
<% for(my $i=0; $i<$context->getNumberOfInputPorts(); ++$i) {
     my $istream = $context->getInputStreamAt($i); %>
void <%=$opername%>::process<%=$i%>(const <%=$istream->getType()%> & tuple) {
  SPCDBG(L_DEBUG, “Processing tuple “ << tuple, “dpsop”);
  // rename input tuple so that output assignment expressions are valid
  // const <%=$istream->getType()%> & <%=$istream->getCppName()%> = tuple;
  // custom code here
  <%=$context->getOutputStreamAt(0)->getType()%> otuple;
  otuple._<%=$context->getOutputStreamAt(0)->getAttributeAt(0)->getName()%> 
    = tuple._<%=$context->getInputStreamAt($i)->getAttributeAt(0)->getName()%>
      + tuple._<%=$context->getInputStreamAt($i)->getAttributeAt(1)->getName()%>;
  submit0(otuple);
}
<% }%>

// For source operators with no ports
/*
void <%=$opername%>::process() {
  SPCDBG(L_DEBUG, “Process…”, “dpsop”);
  while(!(peHandle.getShutdownRequested())) {
    // custom code here
    // Note: use peHandle.blockUntilShutdownRequest for blocking
  }
}
*/

void <%=$opername%>::processPunctuation(const Punctuation::Value & value,
                                        unsigned int input) {
  SPCDBG(L_DEBUG, “Processing punctuation “ << value << “ from input “
                                            << input <<”…”, “dpsop”);
  // custom code here
}

void <%=$opername%>::allPortsReady() {
  // custom code here
}

void <%=$opername%>::prepareToTerminate() {
  // custom code here
}

Creating an operator restriction model file

The next part of the process of developing a new UBOP is to create a restriction model file, which is located in the same subdirectory as the code generation templates. A special SPADE script called spadecfop.pl can be used to create the restriction model file and populate it with XML code that describes the various attributes of the operator. The process for creating the restriction model file is shown in this section.

  1. Run the script to create the restriction model file for the operator being developed.
    spadecfop.pl
  2. Answer the operator restriction model configuration questions that correspond to the operator being designed, as shown in Listing 5.
Listing 5. Summation operator restriction model configuration process
          ###################################################################
          #                                                                 #
          #           Welcome to SPADE's UBOP configuration tool            #
          #                                                                 #
          ###################################################################

Configuring input ports...

-> Configuring input port sets...
Note: Consecutive ports sharing the same configuration are configured in sets. You can  
    configure multiple sets of ports.
--> Port set 0: Please enter the number of input ports to configure (0 for configuring  
    variable number of ports, -1 for exiting and completing port configuration): 1
--> Configuring port 0 ...
---> Please enter the punctuation mode (Expecting [E] or Oblivious [O]): o
---> Please enter the window mode (NonWindowed [N], OptionallyWindowed [O] or 
    Windowed [W]): n
---> Is this port optional? (Y or N): n

--> Port set 1: Please enter the number of input ports to configure (0 for configuring 
    variable number of ports, -1 for exiting and completing port configuration): -1


Configuring output ports...

-> Please enter the output mode (Standard [S] or Aggregation [A]): s
-> Please enter the assignment mode (Expression [E], Attribute [A] or Constant [C]): e
-> Please enter if you want the output assignments to be verified (Y or N): n
-> Configuring output port sets...
Note: Consecutive ports sharing the same configuration are configured in sets. You can 
    configure multiple sets of ports.
--> Port set 0: Please enter the number of output ports to configure (0 for 
    configuring variable number of ports, -1 for exiting and completing port 
    configuration): 1
--> Configuring port 0 ...
---> Please enter the punctuation mode (Generating [G], Free [F] or Preserving [P]): p
---> Is this port optional? (Y or N): n

--> Port set 1: Please enter the number of output ports to configure (0 for 
    configuring variable number of ports, -1 for exiting and completing port 
    configuration): -1

Configuring parameters ...

-> Do you want to configure a parameter? (Y or N): n
-> Do you want to allow arbitrary parameters? (Y or N): n

Configuring libdefs ...

-> Do you want to configure a library? (Y or N): n
New restriction file has been validated successfully

The spadecfop.pl script creates a file named restriction.xml. The content of this file is shown in Listing 6.

Listing 6. Contents of operator restriction model file (restriction.xml)
<?xml version="1.0" ?>
<operator_restriction
  xmlns="http://www.ibm.com/Spade" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.ibm.com/Spade operatorRestriction.xsd">
  <input_streams>
    <input_stream_set>
      <cardinality>1</cardinality>
      <optional>0</optional>
      <punctuation_mode>Oblivious</punctuation_mode>
      <window_mode>NonWindowed</window_mode>
    </input_stream_set>
  </input_streams>
  <libdefs></libdefs>
  <output_streams>
    <assignment_mode>Expression</assignment_mode>
    <assignment_verification>false</assignment_verification>
    <output_mode>Standard</output_mode>
    <output_stream_set>
      <cardinality>1</cardinality>
      <optional>0</optional>
      <punctuation_mode>Preserving</punctuation_mode>
    </output_stream_set>
  </output_streams>
  <parameters>
    <allow_all>0</allow_all>
  </parameters>
</operator_restriction>

Creating and populating application input data

The last part of developing this particular sample application consists of creating and populating the input files expected by the sumubop.dps application. The expected files, AddInt.dat and AddFloat.dat, are read by the two Source operators in Listing 1. Unless otherwise specified by absolute or relative path, by default the location of these files is the data directory under the main application directory.

  1. Change directories to the main application directory.
    cd ~/migrationsamples/sum_ubop
  2. Create the data subdirectory where the input data files are to be created.
    mkdir data
  3. Open an edit session for the integer data input file, giving it the name AddInt.dat.
    my_editor data/AddInt.dat &
  4. Copy the integer data contents shown in Listing 7 into this file, then save the file and exit the edit session.
    Listing 7. Input data file AddInt.dat
    1,1
    2,2
    3,3
    4,4
    5,5
  5. Open an edit session for the float data input file, giving it the name AddFloat.dat.
    my_editor data/AddFloat.dat &
  6. Copy the float data contents shown in Listing 8 into this file, then save the file and exit the edit session.
    Listing 8. Input data file AddFloat.dat
    1.0,1.0
    2.2,2.2
    3.3,3.3
    4.4,4.4
    5.5,5.5

The created sample application is now complete and ready for subsequent processing. The next step is to compile and test the application.


Compiling and testing the UBOP SPADE application

The initial step of the migration process is to ensure that the SPADE application being migrated is working correctly. If the SPADE application doesn’t even compile or produce the correct results, a migration to SPL would likely fail during the translation, or the resulting SPL would likely fail to compile or produce the correct results. The following procedure outlines the steps used to ensure that the SPADE application is suitably prepared for the migration and porting process.

  1. Compile the application.
    make
  2. Run the compiler-generated script start_streams_sumubop.sh to make and start the default instance named spade@your_userid.
    ./start_streams_sumubop.sh
  3. Run the compiler-generated script submitjob_sumubop.sh to submit the sumubop job for execution.
    ./submitjob_sumubop.sh
  4. Monitor the output files, which by default are written to the data subdirectory, to verify that the application is writing data to the files by checking that their sizes are greater than 0 bytes and potentially growing as you continue to look at their sizes. There are two files: one each corresponding to each input file.
    ls -l data/Summed*
  5. Optionally, look at the actual contents of the output files.
    my_editor data/Summed* &
  6. Run the compiler-generated script canceljob_sumubop.sh to cancel and end the sumubop job.
    ./canceljob_sumubop.sh
  7. Run the compiler-generated script stop_streams_sumubop.sh to stop and remove the default instance named spade@your_userid.
    ./stop_streams_sumubop.sh

Translating the UBOP SPADE application to SPL

The next step of migration is to set up for and execute the translation script using a Version 2.0 installation.

  1. Create an SPL working directory, located off the sum_ubop application subdirectory, in which to place and work with the migrated code. This tutorial uses the directory myspl.
    cd ~/migrationsamples/sum_ubop
    mkdir myspl
  2. Convert the SPADE source to SPL using the spade-to-spl translation command.
    spade-to-spl -t myspl -f sumubop.dps

After the translation, a list of errors, warnings, and informational messages similar to the following could result:

Listing 9. Informational message
sumubop.dps:13:6: CDISP0645I A stub will be generated for user
defined operator 'Summation'.
Listing 10. Informational message
sumubop.dps:16:6: CDISP0645I A stub will be generated for user
defined operator 'Summation'.
Listing 11. Warning message
sumubop.dps:20:6: CDISP0724W WARNING: Parameter
'writePunctuations' with a value of 'true' has been specified in
this sink because 'dropPunct' was not specified in the
corresponding SPADE code. This may not be semantically equivalent
if the SPADE code was compiled with '-U' or
'--preserve-punctuations'. You may wish to remove the
'writePunctuations' parameter or give it a value of 'false'.
Listing 12. Warning message
sumubop.dps:23:6: CDISP0724W WARNING: Parameter
'writePunctuations' with a value of 'true' has been specified in
this sink because 'dropPunct' was not specified in the
corresponding SPADE code. This may not be semantically equivalent
if the SPADE code was compiled with '-U' or
'--preserve-punctuations'. You may wish to remove the
'writePunctuations' parameter or give it a value of 'false'.

Each of these items must be addressed and evaluated to determine the course of action required to produce a comparable working application in SPL. The result of addressing these items could be a number of porting activities, such as making changes in the migrated sumubop.spl file and possibly other generated SPL files. There might also be a need for other implementation additions and changes to build up the required environment for an SPL application.


Migrating the UBOP SPADE application to SPL

The next step of the migration process is the post-translation fix-ups and porting of the migrated code and environment. As shown in the previous section, a number of errors, warnings, and informational messages could occur as a result of migration. Addressing and evaluating these items, and fixing them where necessary, is the first major step in the porting process. Once you complete these initial fix-ups, you need to perform any other porting that was not automatically handled or identified as needed by the spade-to-spl translator. The following procedure outlines the steps to do this, addressing the messages from easiest to most difficult.

  1. Evaluate identical warning messages in Listing 11 and Listing 12 from the list of migration messages generated by the spade-to-spl translation command.

    In this case the translator chose to include the line ‘writePunctuations : true;’ as part of the parameter list of the FileSink operator because the original SPADE application did not specifically indicate that punctuation should be dropped. The original SPADE application did not include any punctuation in the resulting output file. The choice of whether to include punctuation depends on how the output file will be parsed and used. For this tutorial example, in order to more closely match the original SPADE application behavior, delete or comment out this punctuation line in each of the places it occurs in the migrated SPL application source, then save and exit the file.

    cd myspl
    my_editor sumubop.spl &
    Delete the ‘writePunctuations : true;’ lines, or comment each out ‘//
     writePunctuations : true;’
  2. Evaluate identical informational messages in Listing 9 and Listing 10.

    Completing the migration for these items involves a substantial amount of work. It requires the porting of code from the SPADE code generation templates and operator model files to the equivalent SPL files that the translation process of the migration generated. The translator creates a skeleton with this directory structure, code files, and function model, while you might need to create other portions manually. The detailed steps for all of this are broken into several tasks, which are outlined in the following sections.

Updating the operator model file

At this point the translator has already created a directory structure and some skeleton stubs for the Summation operator, including the Summation.xml operator model file. This skeleton file must now be updated with information that describes the operator such that its attributes will correspond to those of the original SPADE application.

  1. Change directories to the SPL application subdirectory where the stubs were generated.
    cd ubopStubs/Summation
  2. Edit the operator model, which enables the compiler to perform syntactic and semantic checks for instances of the primitive operator when used in an SPL application. In some cases altering the operator model is desirable when it improves the performance of the operator. The default operator model is the most permissive model and avoids most of the compile-time checks performed by the SPL compiler. For this tutorial, you need to perform only one modification to better match the intent of the SPADE Summation operator, which is defined in the files BIOP_AddInt.h and BIOP_AddFloat.h. For this tutorial, these files are located in ~/migrationsamples/sum_ubop/etc/src. In these SPADE Summation operator definitions, tuples that are processed as inputs are denoted as const. Thus, the default operator model Summation.xml requires the following modification.
    my_editor Summation.xml &
  3. In the <inputPortOpenSet> section, change <tupleMutationAllowed>true</tupleMutationAllowed> to
    <tupleMutationAllowed>false</tupleMutationAllowed>
  4. Save and exit

Porting the operator header file code generation template

The next step in the porting process is to determine which elements of the SPADE UBOP operator header file need to be ported to the SPL generic operator header file code generation template. The corresponding SPADE UBOP header file for this application is located in the ~/migrationsamples/sum_ubop/opt/com.ibm.spade.Summation/generic subdirectory and is named Summation_h.cgt. SPL greatly simplifies what is needed in the operator header file. For this tutorial, there is nothing specific that is portable from SPADE to SPL. However, because SPL requires the application writer to handle synchronization, the following steps are needed to ensure thread safety capabilities.

  1. Edit the SPL operator header file code generation template, and add the missing member data.
    my_editor Summation_h.cgt &
  2. In the private: section under // Members, add the member
    Mutex mutex_;
  3. Save and exit.

Porting the operator implementation code generation template

The last step in the porting process is to determine which elements of the SPADE UBOP operator implementation file need to be ported to the SPL generic code generation template. The corresponding SPADE UBOP implementation file for this application is located in the ~/migrationsamples/sum_ubop/opt/com.ibm.spade.Summation/generic subdirectory and is named Summation_cpp.cgt. For this tutorial, the function from the SPADE UBOP implementation file needs to be ported to the SPL implementation code generation template, which usually requires a degree of programming art, some experimentation, and a bit of trial-and-error to accomplish. The results of one such experience that resulted in a working SPL application are given below. Other ports and implementations are possible, such as the one that is discussed in Going above and beyond.

  1. Edit the SPL operator implementation file code generation template.
    my_editor Summation_cpp.cgt &
  2. In the // Tuple processing for non-mutating ports section, within the void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) block, add the content of Listing 13.
    Listing 13. Porting to the SPL implementation code generation template
    // Prevent update collisions
    AutoPortMutex apm(mutex_, *this);
    			
    OPort0Type otuple;
    IPort0Type const & ituple = static_cast<IPort0Type const &>(tuple);
    	
    otuple.set_<%=$model->getOutputPortAt(0)->getAttributeAt(0)->getName()%>(
    ituple.get_<%=$model->getInputPortAt(0)->getAttributeAt(0)->getName()%>()
    + ituple.get_<%=$model->getInputPortAt(0)->getAttributeAt(1)->getName()%>()
    );
    
    submit(otuple, 0); // submit to output port 0
  3. Save and exit.

Finishing and testing the SPL application

The final step of the migration process is the compilation and testing of the completed SPADE-to-SPL migration and porting solution. Complete the following steps.

  1. Compile the SPL application. This compilation should succeed, because you adequately addressed all of the migration issues through fix-ups and porting.
    cd ~/migrationsamples/sum_ubop/myspl
    sc -T -M sumubop

    Note that the compilation process automatically creates a data subdirectory for the SPL application under the directory where the migrated SPL source was placed.

  2. Copy the original source input files AddInt.dat and AddFloat.dat from the SPADE data subdirectory into the SPL data subdirectory.
    cp ~/migrationsamples/sum_ubop/data/Add*.dat data/
  3. Execute the standalone application.
    ./output/bin/standalone -d 0
  4. Check the contents of the data subdirectory to verify that the application generated the expected output files and that their sizes are not 0.
    ls -l data/Summed*.dat
  5. Optionally inspect the contents of the output files to verify that the application generated the expected output data, which should be similar to that of the original SPADE application.
    my_editor data/Summed*.dat &

Going above and beyond

The procedures outlined in the previous sections should result in a successful migration and generation of comparable output results. To better understand the capabilities and limitations of SPL generic operators, you can optionally go back to the previous sections where the porting to the header file and implementation code generation templates was done for the Summation generic operator and implement the code in a different way.

SPL generic operators are capable of significant flexibility in terms of attributes and elements of the operator model. Although in the particular SPADE-to-SPL migration example shown in this tutorial, the generic operator has a single input port and a single output port. You can code the operator to have the capability to handle multiple ports. This extra credit exercise demonstrates how to do this with additional modifications to both the header file and implementation file code generator templates.

Porting the operator header file code generation template (alternative method)

In addition to the mutex data member that was added to the generic operator header file code generation template as described in Porting the operator header file code generation template, you can also define a helper method to process tuples from multiple, non-mutating ports. The following procedure outlines the steps to do this. Note that in addition to the method definition, the implementation of this helper function, as well as other code changes, must be incorporated in the implementation code generation template.

  1. Edit the SPL operator header file code generation template and add the processing helper member function definition.
    cd ~/migrationsamples/sum_ubop/myspl/ubopStubs/Summation
    my_editor Summation_h.cgt &
  2. Below the // Tuple processing for non-mutating ports section, add a new section
    // Tuple processing helper(s) for multiple non-mutating ports
  3. In this new section, add the following code.
    <%for (my $i=0; $i<$model->getNumberOfInputPorts(); $i++)
    {%>
      void processInput<%=$i%>(IPort<%=$i%>Type const & tuple);
    <%}%>
  4. Save and exit.

Porting the operator implementation code generation template (alternative method)

As an alternative to the hard-coded single port design that was implemented for the generic operator code generation template, as described in Porting the operator implementation code generation template, it is possible to implement a design that would allow for additional flexibility on the number of ports the operator can have. The following procedure outlines one possible set of steps to do this, which would replace the code from the procedure described in Porting the operator implementation code generation template.

  1. Edit the SPL operator implementation file code generation template.
    my_editor Summation_cpp.cgt &
  2. In the // Tuple processing for non-mutating ports section within the void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) block, add the code in Listing 14.
    Listing 14. Porting using the alternative method
    switch (port) {
      <%for (my $i = 0; $i < $model->getNumberOfInputPorts(); $i++) {%>
        case <%=$i%> : {
          processInput<%=$i%>(static_cast<IPort<%=$i%>Type const &>(tuple));
          break;
        }
      <%}%>
        default : assert(!"Cannot happen"); break;
    }
  3. Below the // Tuple processing for non-mutating ports section, add a new section
    // Tuple processing helper(s) for multiple non-mutating ports
  4. In this new section add the code from Listing 15.
    Listing 15. Adding section for multiple non-mutating ports (alternative method)
    <%for (my $i = 0; $i < $model->getNumberOfInputPorts(); $i++) {%>
    void MY_OPERATOR::processInput<%=$i%>(IPort<%=$i%>Type const & tuple) {
      // $i is the port index
    			
      // Prevent update collisions
      AutoPortMutex apm(mutex_, *this);
    			
      OPort<%=$i%>Type otuple;
      otuple.set_<%=$model->getOutputPortAt($i)->getAttributeAt(0)->getName()%>
      (
          tuple.get_<%=$model->getInputPortAt($i)->getAttributeAt(0)->getName()%>()
        + tuple.get_<%=$model->getInputPortAt($i)->getAttributeAt(1)->getName()%>()
      );
      submit(otuple, <%=$i%>);
    }
    <%}%>
  5. Save and exit.

To check whether this multiple port processing code actually works, employ an application that uses the multiple port capabilities of the Summation operator. Table 1 demonstrates one way to modify the original sumubop.spl application (generated by the translator) to use multiple input ports and output ports.

Table 1. Multiple port version of sumubop.spl application
Line #Code
1// sumubop.dps:2
2composite sumubop {
3  graph
4// sumubop.dps:5
5    stream<int32 numOne, int32 numTwo> SrcInt = FileSource() {
6      param
7        file : "AddInt.dat";
8        format : csv;
9    }
10    stream<int32 numOne, int32 numTwo> SrcInt2 = FileSource() {
11      param
12        file : "AddInt2.dat";
13        format : csv;
14    }
15// sumubop.dps:8
16    stream<float32 numOne, float32 numTwo> SrcFloat = FileSource() {
17      param
18        file : "AddFloat.dat";
19        format : csv;
20    }
21    stream<float32 numOne, float32 numTwo> SrcFloat2 = FileSource() {
22      param
23        file : "AddFloat2.dat";
24        format : csv;
25    }
26// sumubop.dps:12
27    (stream<int32 sum> AddInt; stream<int32 sum> AddInt2)
28    = ubopStubs::Summation(SrcInt; SrcInt2) {}
29// sumubop.dps:15
30    (stream<float32 sum> AddFloat; stream<float32 sum> AddFloat2)
31    = ubopStubs::Summation(SrcFloat; SrcFloat2) {}
32// sumubop.dps:19
33    () as SinkOp1 = FileSink(AddInt) {
34      param
35        file : "SummedInt.dat";
36        format : csv;
37        flush : 1u;
38        // writePunctuations : true;
39    }
40    () as SinkOp12 = FileSink(AddInt2) {
41      param
42        file : "SummedInt2.dat";
43        format : csv;
44        flush : 1u;
45        // writePunctuations : true;
46    }
47// sumubop.dps:22
48    () as SinkOp2 = FileSink(AddFloat) {
49      param
50        file : "SummedFloat.dat";
51        format : csv;
52        flush : 1u;
53        // writePunctuations : true;
54    }
55    () as SinkOp22 = FileSink(AddFloat2) {
56      param
57        file : "SummedFloat2.dat";
58        format : csv;
59        flush : 1u;
60        // writePunctuations : true;
61    }
62  config
63// sumubop.dps:2
64    logLevel : debug;
65}

Complete the following changes.

  1. Include FileSource operators for supplying input data to the additional inputs of the Summation operator. The included sources are shown as the SrcInt2 stream in lines 10 through 14 and the SrcFloat2 stream in lines 21 through 25.
  2. Modify the Summation operator itself to have two input ports and two output ports, as shown in lines 27, 28, 30, and 31.
  3. Include FileSink operators to write the additional output data to files for verification. The included sinks are shown as SinkOp12 in lines 40 through 46 and SinkOp22 in lines 55 through 61.
  4. Compile the SPL application.
    cd ~/migrationsamples/sum_ubop/myspl
    sc -T -M sumubop

    Before executing the application, the additional input data must be created.

  5. Copy the existing data files to new files and then edit the new files to have different data if desired. An example of the copy and edit commands are shown in Listing 16.
    Listing 16. New data
    cp data/AddInt.dat data/AddInt2.dat
    cp data/AddFloat.dat data/AddFloat2.dat
    my_editor data/*2.dat &
  6. Change the data in the input data file AddInt2.dat to
    -1,-1
    -2,-2
    -3,-3
    -4,-4
    -5,-5
  7. Change the data in the input data file AddFloat2.dat to
    -1.0,-1.0
    -2.2,-2.2
    -3.3,-3.3
    -4.4,-4.4
    -5.5,-5.5
  8. Save and exit.
  9. Execute the standalone application.
    ./output/bin/standalone -d 0
  10. Check the contents of the data subdirectory to verify that the application generated the expected output files, and that their sizes are not 0.
    ls -l data/Summed*.dat
  11. Optionally inspect the contents of the output files to verify that the application generated the expected output data.
    my_editor data/Summed*.dat &

Conclusion

This tutorial presented migration strategies for a SPADE user-defined built-in operator (UBOP) application. You can continue your education with all 5 parts of this tutorial series.

Keep in mind the fact that the tutorial series examples can only serve as a general guide for how to approach the conversion of a SPADE program to SPL. In other words, each SPADE program is different, and what is encountered during the conversion process for each can vary widely. For example, in the course of converting many other SPADE programs, you might see any of the following:

  • Failures to produce any SPL source
  • Source that doesn't compile
  • Compiled programs that don't run properly
  • Running programs that don't produce the correct results

Of significant importance is the availability of known good results from the SPADE version of the program. Once the SPL program is running, even apparently successfully, be sure to compare the former results with the new. Unfortunately, the new results will not always be good. A divide-and-conquer approach works fairly well in these cases. Both SPADE and SPL programs are easily modified through the addition of temporary file sinks. In both the SPADE and SPL versions of the program, add a new file sink to some operator in the middle of the program to record its output stream. The new file sink produces either good or bad results, which narrows the area of your search to either the part of the program before or after this new file sink. This process, repeated just a few times, usually helps you focus in on the erroneous computation fairly quickly.

None of this should frighten you away from the translation tool. It is, by far, the easiest and fastest way for someone to go from having a SPADE program to having a remarkably similar program that's now expressed in the SPL idiom. Just stay flexible and be prepared to be an active part of the migration process.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=680869
ArticleTitle=Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 5: Migrate SPADE user-defined built-in operator (UBOP) applications
publish-date=06162011