Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 2: Migrate SPADE mixed-mode applications

An introductory guide by example

The most significant new feature of Version 2.0 of the IBM InfoSphere® Streams product is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). Users with SPADE applications from previous versions will need to migrate and port their applications to SPL when upgrading their installations to Version 2.0. This tutorial is Part 2 of a 5-part series that uses actual SPADE samples to demonstrate a series of step-by-step procedures for migrating and porting different types of SPADE application content. Part 2 demonstrates the migration of SPADE mixed-mode applications.

Kevin Erickson (kjerick@us.ibm.com), Senior Software Engineer, I.B.M.

Photo of Kevin EricksonKevin Erickson joined IBM in Rochester, MN, in 1981 as an electrical engineer working on development and test of hard disk drives. He has contributed to a wide range of IBM products and technologies, from disk drive actuator and spindle servo control systems, to AS/400 and successor operating system kernel components. More recently Kevin has worked on performance testing of the Roadrunner supercomputer, the first computer to break the 1 petaflop barrier. In 2009, he transferred to Software Group to work with the InfoSphere Streams team, primarily in a variety of test-related roles. Over his IBM career, Kevin has been issued 14 U.S. patents, plus several in other countries. He holds a B.S. degree in Electrical Engineering from the University of Minnesota.



Richard P. King (rpk@us.ibm.com), Senior Programmer, IBM

Richarc King's photoRichard King has been a part of the InfoSphere Streams team for a number of years. When he joined IBM in 1977, it was as an associate programmer working on the development of System/38. Since transferring to IBM's Research Division at IBM Thomas J. Watson Research Center in Yorktown in 1981, he has worked on a wide variety of projects, including the design and development of what became the IBM Sysplex Coupling Facility. He has a bacholor's degree in industrial engineering and operations research from Cornell University and a master's in operations research and industrial engineering from Northwestern University.



26 May 2011

Before you start

Introduction

More in the series

Look for more in the Migrating InfoSphere Streams SPADE applications to SPL series:

Each part is self-contained and does not rely on the previous example. Therefore, you can learn how to migrate these types of SPADE applications in any order. However, the examples are arranged in what might be considered easiest to most difficult, so completing each part in order might be beneficial.

The "Before you start" section is repeated in each tutorial so that you can complete each part of the series independently.

IBM InfoSphere Streams is a high-performance computing system designed for streaming applications that span many different kinds of business, scientific, engineering, and environmental disciplines. InfoSphere Streams is a highly scalable and powerful analytics platform that can be used to rapidly analyze and solve many real-time problems, in real time, using a wide variety of data streams from thousands of live data sources, as well as reference data from data warehouses. InfoSphere Streams provides an application development and execution environment that is tailored to users developing specific applications that ingest, filter, analyze, correlate, classify, transform, and otherwise process large volumes of continuous data streams. These powerful applications result in faster and more intelligent decision-making outcomes within diverse business sectors such as healthcare, transportation, financial, manufacturing, communications, energy, and security.

Version 2.0 of the IBM InfoSphere Streams product introduces a variety of new features and enhancements to the previous 1.x versions, the most significant of which is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). The primary reasons for this change are to make the language simpler, more intuitive, and more uniform. The most visible change with SPL is the syntax, which has a structure and feel that is more in the realm of C or Java™, thus making it easier for programmers to read, understand, and be more productive when writing SPL applications.

Because the programming language model was transformed from SPADE to SPL in InfoSphere Streams Version 2.0, users with SPADE applications from the previous versions need to migrate and port their applications to SPL when upgrading their installation. Initial information to help users do this is made available as part of the shipped product, which is provided primarily in the IBM Streams Processing Language SPADE Application Migration and Interoperability Guide. This guide is located in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

The IBM Streams Processing Language SPADE Application Migration and Interoperability Guide contains basic information and instructions for migrating SPADE applications, but the information is presented at a fairly high level without specific porting examples. This tutorial complements the high-level guide by providing examples of SPADE-to-SPL application migration. You will use a SPADE sample application that is shipped as part of a SPADE installation as the basis for the migration example in this tutorial. However, it is recommended that you understand the overall migration procedures contained in the high-level guide before tackling the more detailed step-by-step procedures outlined in this tutorial.

Finally, you should have at least a fundamental knowledge of, and experience with, the SPADE programming language prior to proceeding with this tutorial. You should also have access to a version of InfoSphere Streams prior to Version 2.0 (generally referred to as Version 1.2.x herein, because that is the version used to obtain and work with the sample examples) as well as a Version 2.0 installation in order to perform the exercises in this tutorial. You also need at least a basic understanding of SPL, which you can acquire by reading the IBM Streams Processing Language Introductory Tutorial and trying out the various examples given there. The SPL introductory tutorial, as well as several other documents helpful in learning about more advanced aspects of SPL, operator models, operator toolkits, and so on, can be found in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

Setting up

Before proceeding with any of the migration and porting examples, there are some general prerequisites that must be satisfied. These prerequisites should be set up as follows:

  1. Ensure an InfoSphere Streams Version 1.2.x installation is available with the original (unaltered) sample code located under the installation directory within the samples subdirectory.
  2. Ensure an InfoSphere Streams Version 2.0 installation is available.
  3. Ensure availability of at least one terminal access point connection for each installation version with the appropriate environment set up to work with each version.
  4. Create a base working directory, located off your home directory, in which to place and work with the migration samples. The suggested name for this directory, which is used throughout this example, is migrationsamples. If not already created, issue the following commands to create this directory:
    cd ~
    mkdir migrationsamples

Understanding general concepts for the examples

There are a number of common items that pertain to each of the migration and porting examples discussed in this tutorial. Some of these items are environmental, while others are suggested methods and techniques for addressing particular steps of the porting processes.

Dual installation

Note that both InfoSphere Streams Version 1.2.x and Version 2.0 are used in the examples throughout this tutorial. When working with the various elements of SPADE applications, the use of Version 1.2.x is required, whereas when working with the various aspects of migration procedures and SPL applications, the use of Version 2.0 is required. The spade-to-spl command, which is the primary command used to translate SPADE code to SPL, is available in Version 2.0 installations. Thus, when you reach the point in the migration process where the spade-to-spl command is issued, you will switch from using a SPADE installation to using an SPL installation.

Order of post-migration fix-ups and porting

One of the main steps of the migration process is the execution of the spade-to-spl translation command. After this translation, a list of errors, warnings, and informational messages could result. The items in this list can be addressed in any order, but it is generally easier to resolve the simpler issues first because they often don't require any explicit changes, although in some cases changes might be made based on user preference. You can continue to evaluate the issues and work on resolving them in order of increasing difficulty. This is the philosophy and methodology used in this tutorial, which means particular translation issues might be presented out of numerical order from that reported by the translation process.

Compilation choices for SPADE and SPL applications

Throughout this tutorial, the approach for the compilation of applications is different for SPADE and SPL. For both types of applications, either the distributed or standalone methods of compilation can be used. The approaches used in this tutorial were selected to reduce the overall number of steps and commands that would have to be executed to demonstrate certain aspects of the migration processes.

For the SPADE application samples, using the makefile process to compile and generate easy-to-use execution scripts makes running and validating the samples relatively straightforward, even though they use the distributed application environment. For the SPL applications being migrated from the SPADE application samples (and in general when initially developing any new SPL applications), it is usually quicker and easier to initially employ the standalone application method of compilation. The distributed method is best for final production tuning when the application workload needs to be distributed across multiple cluster nodes to produce the best performance characteristics. Therefore, to keep the content of this tutorial as simple as possible, the standalone approach is used for SPL applications.


Configuring and testing the SPADE mixed-mode application

This section describes the process for migrating and porting the mixed-mode type of SPADE application, which is one that adds the capability to parameterize applications through the use of the Perl scripting language mixed with the SPADE language. The application file for this type has the name format <name>.dmm. For this tutorial, the relatively simple SPADE wc application (which is provided as a sample application shipped with Version 1.2.x of InfoSphere Streams) is used to demonstrate the step-by-step migration and porting procedures. In addition to mixing in Perl statements, this sample application also includes a bit more migration complexity to work through because it contains a user-defined source operator.

The initial step of the migration process is to ensure that the SPADE application being migrated is working correctly. If the SPADE application doesn't even compile or produce the correct results, then a migration to SPL would likely fail during the translation, or the resulting SPL would likely fail to compile or produce the correct results. The following procedure outlines the steps used to ensure that the SPADE application is suitably prepared for the migration and porting process.

  1. Change directories to the base working directory.
    cd ~/migrationsamples
  2. Create a copy of the wc sample code subdirectory from the InfoSphere Streams Version 1.2.x installation into a wc subdirectory under the current working directory, where <V1.2.x_Installation> indicates the base file system location path name for the installation.
    cp -r ~/<V1.2.x_Installation>/samples/apps/wc .
  3. Compile the application.
    cd wc
    make
  4. Run the compiler-generated script start_streams_wc.sh to make and start the default instance named spade@your_userid.
    ./start_streams_wc.sh
  5. Run the compiler-generated script submitjob_wc.sh to submit the wc job for execution.
    ./submitjob_wc.sh
  6. Monitor the output file (which by default is written to the data subdirectory) to verify that the application is writing data to the file by checking that its size is greater than 0 bytes. It might grow as you continue to look at its size using the following command:
    ls -l data
  7. Optionally view the contents of the output file.
    cat data/wordcount.dat  [or]
    my_editor data/wordcount.dat &
  8. Run the compiler-generated script canceljob_wc.sh to cancel and end the wc job.
    ./canceljob_wc.sh
  9. Run the compiler-generated script stop_streams_wc.sh to stop and remove the default instance named spade@your_userid.
    ./stop_streams_wc.sh

Translating the mixed-mode SPADE application to SPL

The next migration step is to set up for and execute the translation script. The following procedure outlines the steps to do this, which must be performed using a Version 2.0 installation.

  1. Create an SPL working directory, located off the wc application subdirectory, in which to place and work with the migrated code. The directory name used in this tutorial is myspl.
    mkdir myspl
  2. Convert the SPADE source to SPL using the spade-to-spl translation command. Note that the original command-line argument ($PWD/Makefile) that was passed to the SPADE compiler in the makefile for the SPADE application is also passed as an argument to the SPL translation command.
    spade-to-spl -t myspl -f wc.dmm $PWD/Makefile

After the translation, a list of errors, warnings, and informational messages similar to the following could result:

Listing 1. Warning message
wc.dmm:10:10:
CDISP0704W WARNING: A 'Custom' operator and associated functions
have been synthesized to parse data with a user-defined format.
You need to migrate the parsing code from your SPADE
application.
Listing 2. Informational message
wc.dmm:6:10:
CDISP0452I The output stream 'PerLine' was renamed to 'PerLine_'
due to the insertion of a 'Custom' operator.
Listing 3. Warning message
wc.dmm:20:10:
CDISP0724W WARNING: Parameter 'writePunctuations' with a value of
'true' has been specified in this sink because 'dropPunct' was not
specified in the corresponding SPADE code. This may not be
semantically equivalent if the SPADE code was compiled with '-U'
or '--preserve-punctuations'. You may wish to remove the
'writePunctuations' parameter or give it a value of 'false'.
Listing 4. Error message
wc.dmm:20:10:
CDISP0443E ERROR: SPL FileSource and FileSink operators do not
support the 'type-annotated' file format. The translated operator
format has been set to 'txt'.

Each of these items must be addressed and evaluated to determine the course of action required to produce a comparable working application in SPL. The result of addressing these items might be a number of porting activities, such as changes in the migrated wc.spl file and possibly other generated SPL files. There might also be a need for other implementation additions and changes to build up the required environment for an SPL application.


Migrating the mixed-mode SPADE application to SPL

The next step of the migration process is the post-translation fix-ups and porting of the migrated code and environment. As shown in the previous section, a number of errors, warnings, and informational messages could occur as a result of migration. Addressing and evaluating these items, and fixing them where necessary, is the first major step in the porting process. Once you complete these initial fix-ups, you need to perform any other porting that was not automatically handled or identified as needed by the spade-to-spl translator. The following procedure outlines the steps to do this, addressing the messages from easiest to most difficult.

  1. Change directories to the SPL working directory.
    cd myspl
  2. Evaluate the informational message in Listing 2 from the list of migration messages generated by the spade-to-spl translation command.

    Because the translator found it necessary to break up the SPADE user-defined source operator into two separate operators in SPL (a FileSource and Custom combination), it had to create a new name for the intermediate stream to avoid name conflicts. Thus the intermediate stream name was given the original stream name with an underscore suffix. This addition is benign, and you do not need to make any other changes because of it. However, if the new stream name is not acceptable, change it to something you prefer. Edit the migrated SPL application source and then save and exit the file.

    (Optional)
    my_editor wc.spl &
    Global change ‘PerLine_’ to ‘someothername’
  3. Evaluate the error message in Listing 4 from the list of migration messages generated by the spade-to-spl translation command.

    Because the type annotated data format is not supported in SPL, the translator chooses to use the txt format for the FileSink operator. This is the closest possible porting choice the translator could make. The output differs only slightly in format with the colon (:) separator between label and value being replaced with an equal sign (=), and the complete line output being enclosed in braces ({…}). This addition is mostly benign, and you do not need to make any changes to wc.spl because of it, but the consumer of the output might need to change how it is parsing the data. If the choice for the format is not acceptable, you can change it to something you prefer.

    To optionally change to the csv format, you can select it by editing the migrated SPL application source and then saving and exiting the file.

    my_editor wc.spl &
    Change ‘format : txt;’ to ‘format : csv;’
  4. Evaluate the warning message in Listing 3.

    In this case, the translator chose to include the line 'writePunctuations : true;' as part of the parameter list of the FileSink operator because the original SPADE application did not specifically indicate that punctuation should be dropped. The original SPADE application did not include any punctuation in the resulting output file. The choice of whether to include punctuation depends on how the output file will be parsed and used. To more closely match the original SPADE application behavior, delete or comment out this punctuation line in the migrated SPL application source and then save and exit the file.

    my_editor wc.spl &
    Delete the ‘writePunctuations : true;’ line, or 
    comment out ‘// writePunctuations : true;’
  5. Evaluate the warning message in Listing 1.

    Resolving this particular warning actually requires the most in-depth analysis and involvement in the overall porting process. Because the translator is unable to determine the intent of the complex code contained in the original SPADE word count parser, it chooses to simply return tuples with the attributes set to default initialization values. To complete the porting process, examine the original SPADE word count parser code, and translate the logic into the SPL word count parser.

    1. Examine the logic in the original SPADE word count parser.
      cd ~/migrationsamples/wc/src
      my_editor WCParser_members.h &
    2. Determine the function performed by WCParser() and exit.
    3. Change the default code in the SPL word count parser to match the logic in the original SPADE word count parser.
      cd ~/migrationsamples/wc/myspl
      my_editor wc.spl &
    4. Change ‘return {linecount = 0, wordcount = 0, charcount = 0};’ to the following four lines of code, then save and exit.
      int32 mylinecount = 1;
      int32 mywordcount = size(tokenize(data, " ", false));
      int32 mycharcount = length(data) + 1; // Count EOL
      return {linecount=mylinecount, wordcount=mywordcount, charcount=mycharcount};

Finishing and testing the mixed-mode SPL application

The final step of the migration process is to compile and test the completed SPADE-to-SPL migration and porting solution.

  1. Compile the SPL application. This compilation should succeed because all of the migration issues were adequately addressed through fix-ups and porting.
    sc -T -M wc
  2. Execute the standalone application.
    ./output/bin/standalone -d 0
  3. Check the contents of the data subdirectory to verify that the application generated the expected output file and that its size is not 0.
    ls -l data/wordcount.dat
  4. Optionally, further inspect the contents of the wordcount.dat file to verify that the application generated the expected output data, which should be similar to that of the original SPADE application.
    my_editor data/wordcount.dat &

Going above and beyond

The procedures outlined in the previous sections should result in a successful migration and generation of comparable output results. To better understand the capabilities and limitations of the migration translator, optionally go back to Migrating the mixed-mode SPADE application to SPL, Steps 4-5 and remove the fix-ups and porting changes from the migrated SPL application file to see what would happen if these changes were not made.

  1. Save a copy of the good working SPL application code for later restoration.
    cp wc.spl wc.spl.save
  2. Undo the porting in Migrating the mixed-mode SPADE application to SPL, Steps 4-5. The easiest way to do this is to re-run the translator.
    cd ..
    spade-to-spl -t myspl -f wc.dps $PWD/Makefile
    cd myspl
  3. Compile and execute the standalone SPL application. These steps should succeed.
    sc -T -M wc
    ./output/bin/standalone -d 0
  4. Inspect the contents of the wordcount.dat file to verify that the application generated output data, but notice that this time it has punctuation markers, and the data produced are all zeros.
    my_editor data/wordcount.dat &
  5. Restore the good working SPL application code that you saved previously.
    cp wc.spl.save wc.spl

Conclusion

This tutorial presented migration strategies for a mixed-mode SPADE program. You can continue your education with all 5 parts of this tutorial series. As you tackle progressively more-difficult parts of the series, your effort will go up from checking on a few minor language differences (like for punctuation) to dealing with ancillary objects, such as function models and C++ implementation files for UDFs, UDOPs, and UBOPs. Parts 3-5 of this tutorial series demonstrate progressively more-complex migration procedures, which range from procedures with user-defined functions to procedures that incorporate different kinds of user-defined operators (UDOPs and UBOPs).

Keep in mind the fact that these tutorial examples can only serve as a general guide for how to approach the conversion of a SPADE program to SPL. In other words, each SPADE program is different, and what is encountered during the conversion process for each can vary widely. For example, in the course of converting many other SPADE programs, you might see any of the following:

  • Failures to produce any SPL source
  • Source that doesn't compile
  • Compiled programs that don't run properly
  • Running programs that don't produce the correct results

Of significant importance is the availability of known good results from the SPADE version of the program. Once the SPL program is running, even apparently successfully, be sure to compare the former results with the new. Unfortunately, the new results will not always be good. A divide-and-conquer approach works fairly well in these cases. Both SPADE and SPL programs are easily modified through the addition of temporary file sinks. In both the SPADE and SPL versions of the program, add a new file sink to some operator in the middle of the program to record its output stream. The new file sink produces either good or bad results, which narrows the area of your search to either the part of the program before or after this new file sink. This process, repeated just a few times, usually helps you focus in on the erroneous computation fairly quickly.

None of this should frighten you away from the translation tool. It is, by far, the easiest and fastest way for someone to go from having a SPADE program to having a remarkably similar program that's now expressed in the SPL idiom. Just stay flexible and be prepared to be an active part of the migration process.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=660662
ArticleTitle=Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 2: Migrate SPADE mixed-mode applications
publish-date=05262011