Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 4: Migrate SPADE user-defined operator (UDOP) applications

An introductory guide by example

The most significant new feature of Version 2.0 of the IBM InfoSphere® Streams product is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). Users with SPADE applications from previous versions will need to migrate and port their applications to SPL when upgrading their installations to Version 2.0. This tutorial is Part 4 of a 5-part series that uses actual SPADE samples to demonstrate a series of step-by-step procedures for migrating and porting different types of SPADE application content. Part 4 demonstrates the migration of SPADE user-defined operator (UDOP) applications.

Kevin Erickson (kjerick@us.ibm.com), Senior Software Engineer, I.B.M.

Photo of Kevin EricksonKevin Erickson joined IBM in Rochester, MN, in 1981 as an electrical engineer working on development and test of hard disk drives. He has contributed to a wide range of IBM products and technologies, from disk drive actuator and spindle servo control systems, to AS/400 and successor operating system kernel components. More recently Kevin has worked on performance testing of the Roadrunner supercomputer, the first computer to break the 1 petaflop barrier. In 2009, he transferred to Software Group to work with the InfoSphere Streams team, primarily in a variety of test-related roles. Over his IBM career, Kevin has been issued 14 U.S. patents, plus several in other countries. He holds a B.S. degree in Electrical Engineering from the University of Minnesota.



Richard P. King (rpk@us.ibm.com), Senior Programmer, IBM

Richarc King's photoRichard King has been a part of the InfoSphere Streams team for a number of years. When he joined IBM in 1977, it was as an associate programmer working on the development of System/38. Since transferring to IBM's Research Division at IBM Thomas J. Watson Research Center in Yorktown in 1981, he has worked on a wide variety of projects, including the design and development of what became the IBM Sysplex Coupling Facility. He has a bacholor's degree in industrial engineering and operations research from Cornell University and a master's in operations research and industrial engineering from Northwestern University.



09 June 2011

Before you start

Introduction

More in the series

Look for more in the Migrating InfoSphere Streams SPADE applications to SPL series:

Each part is self-contained and does not rely on the previous example. Therefore, you can learn how to migrate these types of SPADE applications in any order. However, the examples are arranged in what might be considered easiest to most difficult, so completing each part in order might be beneficial.

The "Before you start" section is repeated in each tutorial so that you can complete each part of the series independently.

IBM InfoSphere Streams is a high-performance computing system designed for streaming applications that span many different kinds of business, scientific, engineering, and environmental disciplines. InfoSphere Streams is a highly scalable and powerful analytics platform that can be used to rapidly analyze and solve many real-time problems, in real time, using a wide variety of data streams from thousands of live data sources, as well as reference data from data warehouses. InfoSphere Streams provides an application development and execution environment that is tailored to users developing specific applications that ingest, filter, analyze, correlate, classify, transform, and otherwise process large volumes of continuous data streams. These powerful applications result in faster and more intelligent decision-making outcomes within diverse business sectors such as healthcare, transportation, financial, manufacturing, communications, energy, and security.

Version 2.0 of the IBM InfoSphere Streams product introduces a variety of new features and enhancements to the previous 1.x versions, the most significant of which is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). The primary reasons for this change are to make the language simpler, more intuitive, and more uniform. The most visible change with SPL is the syntax, which has a structure and feel that is more in the realm of C or Java™, thus making it easier for programmers to read, understand, and be more productive when writing SPL applications.

Because the programming language model was transformed from SPADE to SPL in InfoSphere Streams Version 2.0, users with SPADE applications from the previous versions need to migrate and port their applications to SPL when upgrading their installation. Initial information to help users do this is made available as part of the shipped product, which is provided primarily in the IBM Streams Processing Language SPADE Application Migration and Interoperability Guide. This guide is located in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

The IBM Streams Processing Language SPADE Application Migration and Interoperability Guide contains basic information and instructions for migrating SPADE applications, but the information is presented at a fairly high level without specific porting examples. This tutorial complements the high-level guide by providing examples of SPADE-to-SPL application migration. You will use a SPADE sample application that is shipped as part of a SPADE installation as the basis for the migration example in this tutorial. However, it is recommended that you understand the overall migration procedures contained in the high-level guide before tackling the more detailed step-by-step procedures outlined in this tutorial.

Finally, you should have at least a fundamental knowledge of, and experience with, the SPADE programming language prior to proceeding with this tutorial. You should also have access to a version of InfoSphere Streams prior to Version 2.0 (generally referred to as Version 1.2.x herein, because that is the version used to obtain and work with the sample examples) as well as a Version 2.0 installation in order to perform the exercises in this tutorial. You also need at least a basic understanding of SPL, which you can acquire by reading the IBM Streams Processing Language Introductory Tutorial and trying out the various examples given there. The SPL introductory tutorial, as well as several other documents helpful in learning about more advanced aspects of SPL, operator models, operator toolkits, and so on, can be found in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

Setting up

Before proceeding with any of the migration and porting examples, there are some general prerequisites that must be satisfied. These prerequisites should be set up as follows:

  1. Ensure an InfoSphere Streams Version 1.2.x installation is available with the original (unaltered) sample code located under the installation directory within the samples subdirectory.
  2. Ensure an InfoSphere Streams Version 2.0 installation is available.
  3. Ensure availability of at least one terminal access point connection for each installation version with the appropriate environment set up to work with each version.
  4. Create a base working directory, located off your home directory, in which to place and work with the migration samples. The suggested name for this directory, which is used throughout this example, is migrationsamples. If not already created, issue the following commands to create this directory:
    cd ~
    mkdir migrationsamples

Understanding general concepts for the examples

There are a number of common items that pertain to each of the migration and porting examples discussed in this tutorial. Some of these items are environmental, while others are suggested methods and techniques for addressing particular steps of the porting processes.

Dual installation

Note that both InfoSphere Streams Version 1.2.x and Version 2.0 are used in the examples throughout this tutorial. When working with the various elements of SPADE applications, the use of Version 1.2.x is required, whereas when working with the various aspects of migration procedures and SPL applications, the use of Version 2.0 is required. The spade-to-spl command, which is the primary command used to translate SPADE code to SPL, is available in Version 2.0 installations. Thus, when you reach the point in the migration process where the spade-to-spl command is issued, you will switch from using a SPADE installation to using an SPL installation.

Order of post-migration fix-ups and porting

One of the main steps of the migration process is the execution of the spade-to-spl translation command. After this translation, a list of errors, warnings, and informational messages could result. The items in this list can be addressed in any order, but it is generally easier to resolve the simpler issues first because they often don't require any explicit changes, although in some cases changes might be made based on user preference. You can continue to evaluate the issues and work on resolving them in order of increasing difficulty. This is the philosophy and methodology used in this tutorial, which means particular translation issues might be presented out of numerical order from that reported by the translation process.

Compilation choices for SPADE and SPL applications

Throughout this tutorial, the approach for the compilation of applications is different for SPADE and SPL. For both types of applications, either the distributed or standalone methods of compilation can be used. The approaches used in this tutorial were selected to reduce the overall number of steps and commands that would have to be executed to demonstrate certain aspects of the migration processes.

For the SPADE application samples, using the makefile process to compile and generate easy-to-use execution scripts makes running and validating the samples relatively straightforward, even though they use the distributed application environment. For the SPL applications being migrated from the SPADE application samples (and in general when initially developing any new SPL applications), it is usually quicker and easier to initially employ the standalone application method of compilation. The distributed method is best for final production tuning when the application workload needs to be distributed across multiple cluster nodes to produce the best performance characteristics. Therefore, to keep the content of this tutorial as simple as possible, the standalone approach is used for SPL applications.


Configuring and testing the SPADE UDOP application

This section describes the process for migrating and porting a SPADE application containing user-defined operator (UDOP) capabilities. In SPL, UDOPs are called non-generic operators. For this tutorial, the SPADE pg_udop application (which is provided as a sample application shipped with Version 1.2.x of InfoSphere Streams) is used to demonstrate the step-by-step migration and porting procedures.

The initial step of the migration process is to ensure that the SPADE application being migrated is working correctly. If the SPADE application doesn't even compile or produce the correct results, a migration to SPL would likely fail during the translation, or the resulting SPL would likely fail to compile or produce the correct results. The following procedure outlines the steps used to ensure that the SPADE application is suitably prepared for the migration and porting process.

  1. Change directories to the base working directory.
    cd ~/migrationsamples
  2. Create a copy of the portgenericudop sample code from the InfoSphere Streams Version 1.2.x installation into a pg_udop subdirectory under the current working directory, where <V1.2.x_Installation> indicates the base file system location path name for the installation.
    cp -r ~/<V1.2.x_Installation>/samples/atwork/port_generic_udop pg_udop
  3. Compile the application.
    cd pg_udop
    make
  4. Run the compiler-generated script start_streams_portgenericudop.sh to make and start the default instance named spade@your_userid.
    ./start_streams_portgenericudop.sh
  5. Run the compiler-generated script submitjob_portgenericudop.sh to submit the portgenericudop job for execution.
    ./submitjob_portgenericudop.sh
  6. Monitor the output files, which by default are written to the data subdirectory, to verify the application is writing data to the files by checking that their sizes are greater than 0 bytes and potentially growing as you continue to look at their sizes using the following command. There are four output files: three files correspond to their respective input files and one file contains combined output data.
    ls -l data/outdata*
  7. Optionally, look at the actual contents of the output files.
    my_editor data/outdata* &
  8. Run the compiler-generated script canceljob_portgenericudop.sh to cancel and end the portgenericudop job.
    ./canceljob_portgenericudop.sh
  9. Run the compiler-generated script stop_streams_portgenericudop.sh to stop and remove the default instance named spade@your_userid.
    ./stop_streams_portgenericudop.sh

Translating the UDOP SPADE application to SPL

The next step of migration is to set up for and execute the translation script using a Version 2.0 installation.

  1. Create an SPL working directory, located off the pg_udop application subdirectory, in which to place and work with the migrated code. This tutorial uses the directory myspl.
    mkdir myspl
  2. Convert the SPADE source to SPL using the spade-to-spl translation command.
    spade-to-spl -t myspl -f portgenericudop.dps

After the translation, a list of errors, warnings, and informational messages similar to the following could result:

Listing 1. Warning message
portgenericudop.dps:30:17: CDISP0691W WARNING: The SPL 
source operators do not support workload generation.
Listing 2. Warning message
portgenericudop.dps:34:17: CDISP0691W WARNING: The SPL 
source operators do not support workload generation.
Listing 3. Warning message
portgenericudop.dps:38:17: CDISP0691W WARNING: The SPL 
source operators do not support workload generation.
Listing 4. Informational message
portgenericudop.dps:46:8: CDISP0645I A stub will be generated 
for user defined operator 'Unifier'.
Listing 5. Warning message
portgenericudop.dps:48:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been specified 
in this sink because 'dropPunct' was not specified in 
the corresponding SPADE code.  This may not be semantically 
equivalent if the SPADE code was compiled with '-U' or 
'--preserve-punctuations'.  You may wish to remove the 
'writePunctuations' parameter or give it a value of 'false'.
Listing 6. Warning message
portgenericudop.dps:49:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been specified 
in this sink because 'dropPunct' was not specified in the 
corresponding SPADE code.  This may not be semantically equivalent 
if the SPADE code was compiled with '-U' or '--preserve-punctuations'.  
You may wish to remove the 'writePunctuations' parameter or give it 
a value of 'false'.
Listing 7. Warning message
portgenericudop.dps:50:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been specified 
in this sink because 'dropPunct' was not specified in the 
corresponding SPADE code.  This may not be semantically 
equivalent if the SPADE code was compiled with '-U' or 
'--preserve-punctuations'.  You may wish to remove the 
'writePunctuations' parameter or give it a value of 'false'.
Listing 8. Warning message
portgenericudop.dps:51:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been specified 
in this sink because 'dropPunct' was not specified in the 
corresponding SPADE code.  This may not be semantically 
equivalent if the SPADE code was compiled with '-U' or 
'--preserve-punctuations'.  You may wish to remove the 
'writePunctuations' parameter or give it a value of 'false'.

Each of these items must be addressed and evaluated to determine the course of action required to produce a comparable working application in SPL. The result of addressing these items may be a number of porting activities, such as making changes to the migrated portgenericudop.spl file and possibly other generated SPL files. There might also be a need for other implementation additions and changes to build up the required environment for an SPL application.


Migrating the UDOP SPADE application to SPL

The next step of the migration process is the post-translation fix-ups and porting of the migrated code and environment. As shown in the previous section, a number of errors, warnings, and informational messages could occur as a result of migration. Addressing and evaluating these items, and fixing them where necessary, is the first major step in the porting process. Once you complete these initial fix-ups, you need to perform any other porting that was not automatically handled or identified as needed by the spade-to-spl translator. The following procedure outlines the steps to do this, addressing the messages from easiest to most difficult.

  1. Evaluate identical warning messages in Listing 1, Listing 2, and Listing 3 from the list of migration messages generated by the spade-to-spl translation command.

    Because workload generators are not supported in SPL, the input files need to be manually created. However, you can take advantage of the input files that were already generated for the SPADE application by copying them from the SPADE data subdirectory into the SPL data subdirectory. Because the SPL compilation process creates a data directory, you can copy the files later in the migration process.

  2. Evaluate identical warning messages in Listing 5, Listing 6, Listing 7, and Listing 8.

    In this case the translator chose to include the line ‘writePunctuations : true;’ as part of the parameter list of the FileSink operator because the original SPADE application did not specifically indicate that punctuation should be dropped. The original SPADE application did not include any punctuation in the resulting output file. The choice of whether to include punctuation depends on how the output file will be parsed and used. For this tutorial example, in order to more closely match the original SPADE application behavior, delete or comment out this punctuation line in the migrated SPL application source, then save and exit the file.

    cd myspl
    my_editor portgenericudop.spl &
    Delete the ‘writePunctuations : true;’ lines, or comment each out ‘//
    writePunctuations : true;’
  3. Evaluate informational message in Listing 4.

    Correcting the errors for this item involves a substantial amount of work. It requires the creation or existence of a specific directory structure that contains code generation templates for C++ header files, implementation code files, an xml operator model, and so on. The translator creates a skeleton with this directory structure, code files, and function model, while you might need to create other portions manually. The detailed steps are broken into several tasks, which are outlined in the following sections.

Updating the operator model file

At this point the translator has already created a directory structure and some skeleton stubs for the Unifier operator, including the Unifier.xml operator model file. This skeleton file must now be updated with information that describes the operator such that its attributes correspond to those of the original SPADE application.

  1. Change directories to the SPL application subdirectory where the stubs were generated.
    cd ~/migrationsamples/pg_udop/myspl/ubopStubs/Unifier
  2. Edit the Unifier.xml operator model. The compiler uses operator models to perform syntactic and semantic checks on instances of primitive operators used in SPL applications. For non-generic operators, the default operator model can be used for convenience, because non-generic operators are not designed to be reusable and do not require strict error checking. The default operator model is the most permissive model, and it avoids most of the compile-time checks that the SPL compiler performs. However, in some cases, altering the operator model is desirable when it improves the performance of the operator or reenforces the intended characteristics of the original SPADE operator definition. For this tutorial, modify the Unifier.xml to better match the intended C++ type safety characteristics of the SPADE Unifier operator definition in ~/migrationsamples/pg_udop/src/UDOP_Unifier.h, where tuples that are processed as inputs are denoted as const, and are thus expected to be non-mutable.
    my_editor Unifier.xml &
  3. In the <inputPortOpenSet> section, change <tupleMutationAllowed>true</tupleMutationAllowed> to
    <tupleMutationAllowed>false</tupleMutationAllowed>
  4. Save and exit.

Porting the operator header file code generation template

The next step in the porting process is to determine which elements of the SPADE UDOP operator header files need to be ported to the SPL non-generic operator header file code generation template. The corresponding SPADE UDOP header files for this application are located in the ~/migrationsamples/pg_udop/src subdirectory. The header files are named UDOP_Unifier.h and UDOP_Unifier_members.h. SPL greatly simplifies what is needed in the operator header file. For this tutorial, it is necessary to port only the two data members from the SPADE UDOP header files to the SPL code generation template.

  1. Edit the SPL operator header file code generation template and add the missing member data.
    my_editor Unifier_h.cgt &
  2. In the private section under // Members, add the members
    int64_t seq_;
    Mutex mutex_;
  3. Save and exit.

Porting the operator implementation file code generation template

The last step in the porting process is to determine which elements of the SPADE UDOP operator implementation file needs to be ported to the SPL non-generic operator implementation file code generation template. The corresponding SPADE UDOP implementation file for this application is UDOP_Unifier.cpp, which is located in the ~/migrationsamples/pg_udop/src subdirectory. For the tutorial, significant portions of the function from the SPADE UDOP implementation file need to be ported to the SPL code generation template, which usually requires a degree of programming art, some experimentation, and a bit of trial-and-error to accomplish. Following are the results of one such experience that resulted in a working SPL application. Other ports and implementations are possible.

  1. Edit the SPL operator implementation file code generation template.
    my_editor Unifier_cpp.cgt &
  2. In the section labeled /* Additional includes go here */, add
    #include <SPL/Runtime/Operator/Port/OperatorOutputPort.h>
    #include <SPL/Runtime/Type/List.h>
  3. In the section labeled // Constructor, within the code block for the constructor, add
    seq_ = 0;
  4. In the section labeled // Tuple processing for non-mutating ports, within the void MY_OPERATOR::process(Tuple const & tuple, uint32_t port) block, add the code in Listing 9, then save and exit.
Listing 9. SPL non-generic operator process() method implementation
void MY_OPERATOR::process(Tuple const & tuple, uint32_t port)
{
      // Prevent update collisions
     AutoPortMutex apm(mutex_, *this);

     // *** Section to send tuple to corresponding output port *** // 
     std::auto_ptr<Tuple> otuple = getOutputPortAt(port).createTuple();
     otuple->assignFrom(tuple, false);
     int64 & seq = otuple->getAttributeValue("seq");
     seq = seq_++;
     submit(*otuple, port);

     // *** Section to send all tuple info to last output port *** //
     uint32_t loport = getNumberOfOutputPorts()-1;
     std::auto_ptr<Tuple> lotuple = getOutputPortAt(loport).createTuple();

     list<rstring> & names =
          (list<rstring> &)(List &)lotuple->getAttributeValue("names");
     list<rstring> & types =
          (list<rstring> &)(List &)lotuple->getAttributeValue("types");
     list<rstring> & values =
          (list<rstring> &)(List &)lotuple->getAttributeValue("values");

     const std::tr1::unordered_map<std::string,unsigned> & attrbs =
                              tuple.getAttributeNames();
     for(std::tr1::unordered_map<std::string,unsigned>::const_iterator
                              it=attrbs.begin(); it!=attrbs.end(); ++it) {
          const rstring & aname = it->first;
          ConstValueHandle vhandle = tuple.getAttributeValue(aname);
          const rstring atype = vhandle.getMetaType().toString();
          const rstring avalue = vhandle.toString();
          names.push_back(aname);
          types.push_back(atype);
          values.push_back(avalue);
     }
     submit(*lotuple, loport);
}

Finishing and testing the SPL application

The final step of the migration process is to compile and test the completed SPADE-to-SPL migration and porting solution. Complete the following steps.

  1. Compile the SPL application. This compilation should succeed, because you addressed all of the migration issues through fix-ups and porting.
    cd ~/migrationsamples/pg_udop/myspl
    sc -T -M portgenericudop

    The compilation process automatically creates a data subdirectory for the SPL application under the directory where the migrated SPL source was placed.

  2. Copy the original source input files indata1.dat, indata2.dat, and indata3.dat from the SPADE data subdirectory into the SPL data subdirectory.
    cp ~/migrationsamples/pg_udop/data/indata*.dat data/
  3. Execute the standalone application.
    ./output/bin/standalone -d 0
  4. Check the contents of the data subdirectory to verify that the application generated the expected output files, and that their sizes are not 0.
    ls -l data/outdata*.dat
  5. Optionally, further inspect the contents of the output files to verify that the application generated the expected output data, which should be similar to that of the original SPADE application.
    my_editor data/outdata*.dat &

Conclusion

This tutorial presented migration strategies for a user-defined operator (UDOP) SPADE program. You can continue your education with all 5 parts of this tutorial series. As you tackle progressively more-difficult parts of the series, your effort will go up from checking on a few minor language differences (like for punctuation) to dealing with ancillary objects, such as function models and C++ implementation files for UBOPs. Part 5 of this tutorial series demonstrates more-complex migration procedures that incorporate UBOPs.

Keep in mind the fact that these tutorial examples can only serve as a general guide for how to approach the conversion of a SPADE program to SPL. In other words, each SPADE program is different, and what is encountered during the conversion process for each can vary widely. For example, in the course of converting many other SPADE programs, you might see any of the following:

  • Failures to produce any SPL source
  • Source that doesn't compile
  • Compiled programs that don't run properly
  • Running programs that don't produce the correct results

Of significant importance is the availability of known good results from the SPADE version of the program. Once the SPL program is running, even apparently successfully, be sure to compare the former results with the new. Unfortunately, the new results will not always be good. A divide-and-conquer approach works fairly well in these cases. Both SPADE and SPL programs are easily modified through the addition of temporary file sinks. In both the SPADE and SPL versions of the program, add a new file sink to some operator in the middle of the program to record its output stream. The new file sink produces either good or bad results, which narrows the area of your search to either the part of the program before or after this new file sink. This process, repeated just a few times, usually helps you focus in on the erroneous computation fairly quickly.

None of this should frighten you away from the translation tool. It is, by far, the easiest and fastest way for someone to go from having a SPADE program to having a remarkably similar program that's now expressed in the SPL idiom. Just stay flexible and be prepared to be an active part of the migration process.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=678922
ArticleTitle=Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 4: Migrate SPADE user-defined operator (UDOP) applications
publish-date=06092011