Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 3: Migrate SPADE user-defined function applications

An introductory guide by example

The most significant new feature of Version 2.0 of the IBM InfoSphere® Streams product is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). Users with SPADE applications from previous versions will need to migrate and port their applications to SPL when upgrading their installations to Version 2.0. This tutorial is Part 3 of a 5-part series that uses actual SPADE samples to demonstrate a series of step-by-step procedures for migrating and porting different types of SPADE application content. Part 3 demonstrates the migration of SPADE user-defined function applications.

Richard P. King (rpk@us.ibm.com), Senior Programmer, IBM

Richarc King's photoRichard King has been a part of the InfoSphere Streams team for a number of years. When he joined IBM in 1977, it was as an associate programmer working on the development of System/38. Since transferring to IBM's Research Division at IBM Thomas J. Watson Research Center in Yorktown in 1981, he has worked on a wide variety of projects, including the design and development of what became the IBM Sysplex Coupling Facility. He has a bacholor's degree in industrial engineering and operations research from Cornell University and a master's in operations research and industrial engineering from Northwestern University.



Kevin Erickson (kjerick@us.ibm.com), Senior Software Engineer, IBM

Photo of Kevin EricksonKevin Erickson joined IBM in Rochester, MN, in 1981 as an electrical engineer working on development and test of hard disk drives. He has contributed to a wide range of IBM products and technologies, from disk drive actuator and spindle servo control systems, to AS/400 and successor operating system kernel components. More recently Kevin has worked on performance testing of the Roadrunner supercomputer, the first computer to break the 1 petaflop barrier. In 2009, he transferred to Software Group to work with the InfoSphere Streams team, primarily in a variety of test-related roles. Over his IBM career, Kevin has been issued 14 U.S. patents, plus several in other countries. He holds a B.S. degree in Electrical Engineering from the University of Minnesota.



18 July 2011

Before you start

Introduction

More in the series

Look for more in the Migrating InfoSphere Streams SPADE applications to SPL series:

Each part is self-contained and does not rely on the previous example. Therefore, you can learn how to migrate these types of SPADE applications in any order. However, the examples are arranged in what might be considered easiest to most difficult, so completing each part in order might be beneficial.

The "Before you start" section is repeated in each tutorial so that you can complete each part of the series independently.

IBM InfoSphere Streams is a high-performance computing system designed for streaming applications that span many different kinds of business, scientific, engineering, and environmental disciplines. InfoSphere Streams is a highly scalable and powerful analytics platform that can be used to rapidly analyze and solve many real-time problems, in real time, using a wide variety of data streams from thousands of live data sources, as well as reference data from data warehouses. InfoSphere Streams provides an application development and execution environment that is tailored to users developing specific applications that ingest, filter, analyze, correlate, classify, transform, and otherwise process large volumes of continuous data streams. These powerful applications result in faster and more intelligent decision-making outcomes within diverse business sectors such as healthcare, transportation, financial, manufacturing, communications, energy, and security.

Version 2.0 of the IBM InfoSphere Streams product introduces a variety of new features and enhancements to the previous 1.x versions, the most significant of which is the programming language model transformation from Streams Processing Application Declarative Engine (SPADE) to Streams Processing Language (SPL). The primary reasons for this change are to make the language simpler, more intuitive, and more uniform. The most visible change with SPL is the syntax, which has a structure and feel that is more in the realm of C or Java™, thus making it easier for programmers to read, understand, and be more productive when writing SPL applications.

Because the programming language model was transformed from SPADE to SPL in InfoSphere Streams Version 2.0, users with SPADE applications from the previous versions need to migrate and port their applications to SPL when upgrading their installation. Initial information to help users do this is made available as part of the shipped product, which is provided primarily in the IBM Streams Processing Language SPADE Application Migration and Interoperability Guide. This guide is located in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

The IBM Streams Processing Language SPADE Application Migration and Interoperability Guide contains basic information and instructions for migrating SPADE applications, but the information is presented at a fairly high level without specific porting examples. This tutorial complements the high-level guide by providing examples of SPADE-to-SPL application migration. You will use a SPADE sample application that is shipped as part of a SPADE installation as the basis for the migration example in this tutorial. However, it is recommended that you understand the overall migration procedures contained in the high-level guide before tackling the more detailed step-by-step procedures outlined in this tutorial.

Finally, you should have at least a fundamental knowledge of, and experience with, the SPADE programming language prior to proceeding with this tutorial. You should also have access to a version of InfoSphere Streams prior to Version 2.0 (generally referred to as Version 1.2.x herein, because that is the version used to obtain and work with the sample examples) as well as a Version 2.0 installation in order to perform the exercises in this tutorial. You also need at least a basic understanding of SPL, which you can acquire by reading the IBM Streams Processing Language Introductory Tutorial and trying out the various examples given there. The SPL introductory tutorial, as well as several other documents helpful in learning about more advanced aspects of SPL, operator models, operator toolkits, and so on, can be found in the doc subdirectory below the directory where InfoSphere Streams is installed or in the IBM InfoSphere Streams Information Center (see Resources).

Setting up

Before proceeding with any of the migration and porting examples, there are some general prerequisites that must be satisfied. These prerequisites should be set up as follows:

  1. Ensure an InfoSphere Streams Version 1.2.x installation is available with the original (unaltered) sample code located under the installation directory within the samples subdirectory.
  2. Ensure an InfoSphere Streams Version 2.0 installation is available.
  3. Ensure availability of at least one terminal access point connection for each installation version with the appropriate environment set up to work with each version.
  4. Create a base working directory, located off your home directory, in which to place and work with the migration samples. The suggested name for this directory, which is used throughout this example, is migrationsamples. If not already created, issue the following commands to create this directory:
    cd ~
    mkdir migrationsamples

Understanding general concepts for the examples

There are a number of common items that pertain to each of the migration and porting examples discussed in this tutorial. Some of these items are environmental, while others are suggested methods and techniques for addressing particular steps of the porting processes.

Dual installation

Note that both InfoSphere Streams Version 1.2.x and Version 2.0 are used in the examples throughout this tutorial. When working with the various elements of SPADE applications, the use of Version 1.2.x is required, whereas when working with the various aspects of migration procedures and SPL applications, the use of Version 2.0 is required. The spade-to-spl command, which is the primary command used to translate SPADE code to SPL, is available in Version 2.0 installations. Thus, when you reach the point in the migration process where the spade-to-spl command is issued, you will switch from using a SPADE installation to using an SPL installation.

Order of post-migration fix-ups and porting

One of the main steps of the migration process is the execution of the spade-to-spl translation command. After this translation, a list of errors, warnings, and informational messages could result. The items in this list can be addressed in any order, but it is generally easier to resolve the simpler issues first because they often don't require any explicit changes, although in some cases changes might be made based on user preference. You can continue to evaluate the issues and work on resolving them in order of increasing difficulty. This is the philosophy and methodology used in this tutorial, which means particular translation issues might be presented out of numerical order from that reported by the translation process.

Compilation choices for SPADE and SPL applications

Throughout this tutorial, the approach for the compilation of applications is different for SPADE and SPL. For both types of applications, either the distributed or standalone methods of compilation can be used. The approaches used in this tutorial were selected to reduce the overall number of steps and commands that would have to be executed to demonstrate certain aspects of the migration processes.

For the SPADE application samples, using the makefile process to compile and generate easy-to-use execution scripts makes running and validating the samples relatively straightforward, even though they use the distributed application environment. For the SPL applications being migrated from the SPADE application samples (and in general when initially developing any new SPL applications), it is usually quicker and easier to initially employ the standalone application method of compilation. The distributed method is best for final production tuning when the application workload needs to be distributed across multiple cluster nodes to produce the best performance characteristics. Therefore, to keep the content of this tutorial as simple as possible, the standalone approach is used for SPL applications.


Configuring and testing the SPADE user-defined function application

This section describes the process for migrating and porting a SPADE application containing user-defined function capabilities. For this tutorial, the SPADE udf_at_work application (which is provided as a sample application shipped with Version 1.2.x of InfoSphere Streams) is used to demonstrate the step-by-step migration and porting procedures.

In SPL, user-defined functions are called native functions. User-defined functions and native functions can be implemented in one of two different ways: as code libraries or as inline functions. This tutorial describes how to port to both types of implementations.

The initial step of the migration process is to ensure that the SPADE application being migrated is working correctly. If the SPADE application doesn't even compile or produce the correct results, a migration to SPL would likely fail during the translation, or the resulting SPL would likely fail to compile or produce the correct results. The following procedure outlines the steps used to ensure that the SPADE application is suitably prepared for the migration and porting process.

  1. Change directories to the base working directory.
    cd ~/migrationsamples
  2. Create a copy of the udf_at_work sample code from the InfoSphere Streams Version 1.2.x installation into a udf subdirectory under the current working directory, where <V1.2.x_Installation> indicates the base file system location path name for the installation.
    cp -r ~/<V1.2.x_Installation>/samples/atwork/user_defined_functions udf
  3. Compile the application.
    cd udf
    make
  4. Run the compiler-generated script start_streams_udf_at_work.sh to make and start the default instance named spade@your_userid.
    ./start_streams_udf_at_work.sh
  5. Run the compiler-generated script submitjob_udf_at_work.sh to submit the udf_at_work job for execution.
    ./submitjob_udf_at_work.sh
  6. Monitor the output files, which by default are written to the data subdirectory, to verify that the application is writing data to the files by checking that their sizes are greater than 0 bytes and potentially growing as you continue to look at their sizes. There are two files: one corresponding to the inline implementation and one corresponding to the library implementation.
    ls -l data/*.result
  7. Optionally, look at the actual contents of the output files.
    my_editor data/*.result &
  8. Run the compiler-generated script canceljob_udf_at_work.sh to cancel and end the udf_at_work job.
    ./canceljob_udf_at_work.sh
  9. Run the compiler-generated script stop_streams_udf_at_work.sh to stop and remove the default instance named spade@your_userid.
    ./stop_streams_udf_at_work.sh

Translating the user-defined function SPADE application to SPL

The next step of migration is to set up for and execute the translation script using a Version 2.0 installation.

  1. Create an SPL working directory, located off the udf application subdirectory, in which to place and work with the migrated code. This tutorial uses the directory myspl.
    mkdir myspl
  2. Convert the SPADE source to SPL using the spade-to-spl translation command.
    spade-to-spl -t myspl -f udf_at_work.dps

After the translation, a list of errors, warnings, and informational messages similar to the following could result:

Listing 1. Error message
udf_at_work.dps:68:9: CDISP0695E ERROR: Functions defined 
in headers specified using the 'package' keyword must now 
be specified in the 'function.xml' file.  Refer to the 'SPL 
Toolkit Development Reference' for specifics.
Listing 2. Error message
udf_at_work.dps:69:9: CDISP0695E ERROR: Functions defined 
in headers specified using the 'package' keyword must now 
be specified in the 'function.xml' file.  Refer to the 'SPL 
Toolkit Development Reference' for specifics.
Listing 3. Error message
udf_at_work.dps:71:1: CDISP0697E ERROR: Library paths specified 
using the 'libpath' keyword must now be specified in the 
'function.xml' file.  Refer to the 'SPL Toolkit Development 
Reference' for specifics.
Listing 4. Error message
udf_at_work.dps:74:1: CDISP0696E ERROR: Libraries specified 
using the 'libs' keyword must now be specified in the 
'function.xml' file.  Refer to the 'SPL Toolkit Development 
Reference' for specifics.
Listing 5. Warning message
udf_at_work.dps:109:8: CDISP0703W WARNING: SPL does not 
support column selectors.  Ensure that the source's output
schema accounts for all attributes in the input data, 
otherwise a runtime exception may occur.
Listing 6. Informational message
udf_at_work.dps:122:36: CDISP0478I The 'toFloat' function is 
not provided by the SPL run-time library.  It was replaced 
by a cast to 'float32'.
Listing 7. Informational message
udf_at_work.dps:150:36: CDISP0478I The 'toFloat' function is 
not provided by the SPL run-time library.  It was replaced 
by a cast to 'float32'.
Listing 8. Warning message
udf_at_work.dps:175:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been 
specified in this sink because 'dropPunct' was not 
specified in the corresponding SPADE code.  This may
not be semantically equivalent if the SPADE code was 
compiled with '-U' or '--preserve-punctuations'.  You may 
wish to remove the 'writePunctuations' parameter or give 
it a value of 'false'.
Listing 9. Warning message
udf_at_work.dps:187:8: CDISP0724W WARNING: Parameter 
'writePunctuations' with a value of 'true' has been 
specified in this sink because 'dropPunct' was not 
specified in the corresponding SPADE code.  This may 
not be semantically equivalent if the SPADE code was 
compiled with '-U' or '--preserve-punctuations'.  You
may wish to remove the 'writePunctuations' parameter 
or give it a value of 'false'.
Listing 10. Error message
udf_at_work.dps:193:7: CDISP0063E ERROR: Func debug not 
supported in SPL.

Each of these items must be addressed and evaluated to determine the course of action required to produce a comparable working application in SPL. The result of addressing these items could be a number of porting activities, such as making changes to the migrated udf_at_work.spl file, the function.xml file, and possibly other generated SPL files. There might also be a need for other implementation additions and changes to build up the required environment for an SPL application.


Migrating the user-defined function SPADE application to SPL

The next step of the migration process is the post-translation fix-ups and porting of the migrated code and environment. As shown in the previous section, a number of errors, warnings, and informational messages could occur as a result of migration. Addressing and evaluating these items, and fixing them where necessary, is the first major step in the porting process. Once you complete these initial fix-ups, you need to perform any other porting that was not automatically handled or identified as needed by the spade-to-spl translator. The following procedure outlines the steps to do this, addressing the messages from easiest to most difficult.

  1. Evaluate the warning message in Listing 5 from the list of migration messages generated by the spade-to-spl translation command.

    Because column selectors are not supported in SPL, the translator warns of this usage from the SPADE application. When using the comma separated value (csv) format, the FileSource is expected to extract all the columns in each line precisely as formatted and to present each of these columns as an attribute in the output tuple. In this case, even though column selectors are being coded, there are no real breaks in the specification of the columns, and they correspondingly match the output schema. Also, there are no extra columns in the input data. Therefore, this warning is benign, and you do not need to make any changes because of it.

  2. Evaluate the informational messages in Listing 6 and Listing 7.

    The replacement of toFloat with a cast to float32 is appropriate. Nothing needs to be done for these items.

  3. Evaluate error message in Listing 10.

    Although this item was flagged as an error, migration is not possible because there is no equivalent compile-time function debug feature in SPL. Therefore, nothing needs to be done for this item.

  4. Evaluate warning messages in Listing 8 and Listing 9.

    In this case the translator chose to include the line ‘writePunctuations : true;’ as part of the parameter list of the FileSink operator because the original SPADE application did not specifically indicate that punctuation should be dropped. The original SPADE application did not include any punctuation in the resulting output file. The choice of whether to include punctuation depends on how the output file will be parsed and used. For this tutorial example, in order to more closely match the original SPADE application behavior, delete or comment out this punctuation line in the migrated SPL application source, then save and exit the file.

    cd myspl
    my_editor udf_at_work.spl &
    Delete the ‘writePunctuations : true;’ lines, or comment each out ‘//
    writePunctuations : true;’.
  5. Evaluate error messages Listing 1, Listing 2, Listing 3, and Listing 4.

    Correcting the errors for these items involves a substantial amount of work. It requires creating a certain directory structure that contains C++ header files and implementation code files, compiled library code, an xml function model, and so on. The translator creates a skeleton with this directory structure, code files, and function model, while you might need to create other portions. The detailed steps are broken into several tasks, which are outlined in the following sections.

Updating the function model file

At this point the translator has already brought some errors to your attention regarding the need to specify some things in the function model file. A skeleton of this file was created during the translation process, which you must now update with information corresponding to the original SPADE application. Having already fixed all of the issues found during the translation process, with the exception of the errors related to the function model, the next step is to further investigate these errors. The best way to obtain additional information about these errors is by attempting to compile the SPL application.

  1. Change directories to the SPL application directory.
    cd ~/migrationsamples/udf/myspl
  2. Compile the application.
    sc -T -M udf_at_work

    After the compilation, a list of errors similar to the following could result. The first three errors are associated with the inline form of the user-defined function, while the remaining errors are associated with the library form. Both sets of errors are handled at the same time when updating the function model file.

    Listing 11. Error message
    udf_at_work.spl:97:100: CDISP0052E ERROR: Unknown callee 
    function 'getStateTaxPercentage(rstring)'.
    Listing 12. Error message
    udf_at_work.spl:97:149: CDISP0052E ERROR: Unknown callee 
    function 'getTotalTax(rstring, float32, int32)'.
    Listing 13. Error message
    udf_at_work.spl:97:243: CDISP0052E ERROR: Unknown callee 
    function 'getShipmentTrackingNumber(rstring, rstring)'.
    Listing 14. Error message
    udf_at_work.spl:126:101: CDISP0052E ERROR: Unknown callee 
    function 'libGetStateTaxPercentage(rstring)'.
    Listing 15. Error message
    udf_at_work.spl:126:153: CDISP0052E ERROR: Unknown callee 
    function 'libGetTotalTax(rstring, float32, int32)'.
    Listing 16. Error message
    udf_at_work.spl:126:250: CDISP0052E ERROR: Unknown callee 
    function 'libGetShipmentTrackingNumber(rstring, rstring)'.

Table 1 shows the skeleton function.xml file that the translator generates.

Table 1. Skeleton function.xml file
Line #Code
1<functionModel
2  xmlns="http://www.ibm.com/xmlns/prod/streams/spl/function"
3  xmlns:cmn="http://www.ibm.com/xmlns/prod/streams/spl/common"
4  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5  xsi:schemaLocation="http://www.ibm.com/xmlns/prod/streams/spl/function functionModel.xsd">
6  <functionSet>
7    <!-- header file to include from within C++ code -->
8    <headerFileName>Sample.h</headerFileName>
9    <!-- functions lists the SPL prototypes of the functions implemented in this library -->
10    <functions>
11      <function>
12        <!-- use of CDATA allows easy use of <> in the prototypes -->
13        <prototype><![CDATA[ public void replaceMe() ]]></prototype>
14      </function>
15    </functions>
16    <dependencies>
17      <!-- This library can have several dependencies. Only one is used here. -->
18      <library>
19        <!-- A description for this library -->
20        <cmn:description>Sample-Functions</cmn:description>
21        <cmn:managedLibrary>
22          <!-- the name of the library for linking. Will be used as -lSample -->
23          <cmn:lib>Sample</cmn:lib>
24          <!-- Where to find the library. Relative to the current directory.
25            Will be used as -L<dir>/lib -->
26          <cmn:libPath>lib</cmn:libPath>
27          <!-- Where to find the include file. Relative to the current directory.
28            Will be used as -I<dir> -->
29          <cmn:includePath>./</cmn:includePath>
30        </cmn:managedLibrary>
31      </library>
32    </dependencies>
33  </functionSet>
34</functionModel>

This file is placed under the SPL application directory with the path functions/native.function/function.xml. You need to enter specific information in several pertinent sections in the file that describe the function being implemented. In this particular case, both the inline form of function and the library form of function are described within the same file to demonstrate how different sets of function are handled within the same toolkit namespace. To do this, you need two <functionSet> sections: one for the inline include implementation and another for the code library implementation. Complete the following steps to create two <functionSet> sections.

  1. Edit the function model file function.xml to provide <functionSet> sections for both inline and library implementations.
    my_editor functions/native.function/function.xml &
  2. Copy and paste the code between lines 6 and 33, making the replicated section initially becoming lines 34 through 61 and the </functionModel> tag moving to line 62.
  3. Save and exit from my_editor.

The information that goes into each of these sections is described below. Note that further modifications will result in more line number changes, but this is a starting point.


Updating the function model file

Table 2 shows the skeleton of the function.xml file section pertaining to the inline function form.

Table 2. Inline function set section skeleton
Line #Code
6  <functionSet>
7    <!-- header file to include from within C++ code -->
8    <headerFileName>Sample.h</headerFileName>
9    <!-- functions lists the SPL prototypes of the functions implemented in this library -->
10    <functions>
11      <function>
12        <!-- use of CDATA allows easy use of <> in the prototypes -->
13        <prototype><![CDATA[ public void replaceMe() ]]></prototype>
14      </function>
15    </functions>
16    <dependencies>
17      <!-- This library can have several dependencies. Only one is used here. -->
18      <library>
19        <!-- A description for this library -->
20        <cmn:description>Sample-Functions</cmn:description>
21        <cmn:managedLibrary>
22          <!-- the name of the library for linking. Will be used as -lSample -->
23          <cmn:lib>Sample</cmn:lib>
24          <!-- Where to find the library. Relative to the current directory.
25            Will be used as -L<dir>/lib -->
26          <cmn:libPath>lib</cmn:libPath>
27          <!-- Where to find the include file. Relative to the current directory.
28            Will be used as -I<dir> -->
29          <cmn:includePath>./</cmn:includePath>
30        </cmn:managedLibrary>
31      </library>
32    </dependencies>
33  </functionSet>

Complete the following steps to modify this section. Because the line numbers change as you make modifications, each modification is made with reference to the original skeleton line numbers.

  1. Edit the function model file function.xml to add function prototypes that were declared as an Unknown callee function by the previous compilation.
    my_editor functions/native.function/function.xml &
  2. On line 8, replace Sample.h with the following as the <headerFileName>.
    CalculateStateTaxInline.h
  3. Replicate lines 11 through 14 twice to achieve three sections, which become lines 11 through 22. Because three callee functions were identified as unknown, you are creating three <function> tag sections.
  4. Replace the <prototype> section for each of the three lines corresponding to replicated line 13 as follows:
    1. Replace line 13 with
      <![CDATA[ public float32 getStateTaxPercentage(rstring stateName) ]]>
    2. Replace the new line 17 with
      <![CDATA[ public float32 getTotalTax(rstring stateName, float32 unitPrice, 
      int32 quantity) ]]>
    3. Replace the new line 21 with
      <![CDATA[ public int32 getShipmentTrackingNumber(rstring buyerName, rstring 
      productName) ]]>
  5. In line 20 (new line 28), replace Sample-Functions with
    UDF-Inline Functions

    as the <cmn:description>. The name can be any descriptive text.
  6. In lines 23 and 26 (new lines 31 and 34), comment out the <cmn:lib> and <cmn:libPath> tag sections. The inline include implementation does not have any true library artifacts.
  7. In line 29 (new line 37), replace ./ with
    ../../opt/include

    as the <cmn:includePath>.
  8. Save and exit from my_editor.

Table 3 shows the results of the inline function set section after you have made all those modifications.

Table 3. Inline function set section filled in
Line #Code
6  <functionSet>
7    <!-- header file to include from within C++ code -->
8    <headerFileName>CalculateStateTaxInline.h</headerFileName>
9    <!-- functions lists the SPL prototypes of the functions implemented in this library -->
10    <functions>
11      <function>
12        <!-- use of CDATA allows easy use of <> in the prototypes -->
13        <prototype><![CDATA[ public float32 getStateTaxPercentage(rstring
        stateName) ]]></prototype>
14      </function>
15      <function>
16        <!-- use of CDATA allows easy use of <> in the prototypes -->
17        <prototype><![CDATA[ public float32 getTotalTax(rstring stateName, float32
        unitPrice, int32 quantity) ]]></prototype>
18      </function>
19      <function>
20        <!-- use of CDATA allows easy use of <> in the prototypes -->
21        <prototype><![CDATA[ public int32 getShipmentTrackingNumber(rstring
        buyerName, rstring productName) ]]></prototype>
22      </function>
23    </functions>
24    <dependencies>
25      <!-- This library can have several dependencies. Only one is used here. -->
26      <library>
27        <!-- A description for this library -->
28        <cmn:description>UDF-Inline Functions</cmn:description>
29        <cmn:managedLibrary>
30          <!-- the name of the library for linking. Will be used as -lSample -->
31          <!-- <cmn:lib>Sample</cmn:lib> -->
32          <!-- Where to find the library. Relative to the current directory.
33            Will be used as -L<dir>/lib -->
34          <!-- <cmn:libPath>lib</cmn:libPath> -->
35          <!-- Where to find the include file. Relative to the current directory.
36            Will be used as -I<dir> -->
37          <cmn:includePath>../../opt/include</cmn:includePath>
38        </cmn:managedLibrary>
39      </library>
40    </dependencies>
41  </functionSet>

Updating the function model library section

Table 4 shows the skeleton of the function.xml file section pertaining to the library function form.

Table 4. Library function set section skeleton
Line #Code
42  <functionSet>
43    <!-- header file to include from within C++ code -->
44    <headerFileName>Sample.h</headerFileName>
45    <!-- functions lists the SPL prototypes of the functions implemented in this library -->
46    <functions>
47      <function>
48        <!-- use of CDATA allows easy use of <> in the prototypes -->
49        <prototype><![CDATA[ public void replaceMe() ]]></prototype>
50      </function>
51    </functions>
52    <dependencies>
53      <!-- This library can have several dependencies. Only one is used here. -->
54      <library>
55        <!-- A description for this library -->
56        <cmn:description>Sample-Functions</cmn:description>
57        <cmn:managedLibrary>
58          <!-- the name of the library for linking. Will be used as -lSample -->
59          <cmn:lib>Sample</cmn:lib>
60          <!-- Where to find the library. Relative to the current directory.
61            Will be used as -L<dir>/lib -->
62          <cmn:libPath>lib</cmn:libPath>
63          <!-- Where to find the include file. Relative to the current directory.
64            Will be used as -I<dir> -->
65          <cmn:includePath>./</cmn:includePath>
66        </cmn:managedLibrary>
67      </library>
68    </dependencies>
69  </functionSet>
70</functionModel>

Complete the following steps to modify this section. Because the line numbers change as you make modifications, each modification is made with reference to the original skeleton line numbers.

  1. Edit the function model file function.xml to add function prototypes that the previous compilation declared as an Unknown callee function.
    my_editor functions/native.function/function.xml &
  2. On line 44, replace Sample.h with
    CalculateStateTaxLibrary.h

    as the <headerFileName>.
  3. Replicate lines 47 through 50 twice to achieve three sections, which become lines 47 through 58. Because three callee functions were identified as unknown, you are creating three <function> tag sections.
  4. Replace the <prototype> section for each of the three lines corresponding to replicated line 49 as follows:
    1. Replace line 49 with
      <![CDATA[ public float32 libGetStateTaxPercentage(rstring) ]]>
    2. Replace the new line 53 with
      <![CDATA[ public float32 libGetTotalTax(rstring, float32, int32) ]]>
    3. Replace the new line 57 with
      <![CDATA[ public int32 libGetShipmentTrackingNumber(rstring, rstring) ]]>
  5. On line 56 (new line 64), replace Sample-Functions with
    UDF-Functions

    as the <cmn:description>. This name can be any descriptive text.
  6. On line 59 (new line 67), replace Sample with
    CalculateStateTax

    as the <cmn:lib>.
  7. On line 62 (new line 70), replace lib with
    ../../opt/lib

    as the <cmn:libPath>.
  8. On line 65 (new line 73), replace ./ with
    ../../opt/include

    as the <cmn:includePath>.
  9. Save and exit from my_editor.

Table 5 shows the results of the library function set section after you have made all those modifications.

Table 5. Library function set section filled in
Line #Code
42  <functionSet>
43    <!-- header file to include from within C++ code -->
44    <headerFileName>CalculateStateTaxLibrary.h</headerFileName>
45    <!-- functions lists the SPL prototypes of the functions implemented in this library -->
46    <functions>
47      <function>
48        <!-- use of CDATA allows easy use of <> in the prototypes -->
49        <prototype><![CDATA[ public float32
        libGetStateTaxPercentage(rstring) ]]></prototype>
50      </function>
51      <function>
52        <!-- use of CDATA allows easy use of <> in the prototypes -->
53        <prototype><![CDATA[ public float32 libGetTotalTax(rstring, float32,
        int32) ]]></prototype>
54      </function>
55      <function>
56        <!-- use of CDATA allows easy use of <> in the prototypes -->
57        <prototype><![CDATA[ public int32 libGetShipmentTrackingNumber(rstring,
        rstring) ]]></prototype>
58      </function>
59    </functions>
60    <dependencies>
61      <!-- This library can have several dependencies. Only one is used here. -->
62      <library>
63        <!-- A description for this library -->
64        <cmn:description>UDF-Functions</cmn:description>
65        <cmn:managedLibrary>
66          <!-- the name of the library for linking. Will be used as -lSample -->
67          <cmn:lib>CalculateStateTax</cmn:lib>
68          <!-- Where to find the library. Relative to the current directory.
69            Will be used as -L<dir>/lib -->
70          <cmn:libPath>../../opt/lib</cmn:libPath>
71          <!-- Where to find the include file. Relative to the current directory.
72            Will be used as -I<dir> -->
73          <cmn:includePath>../../opt/include</cmn:includePath>
74        </cmn:managedLibrary>
75      </library>
76    </dependencies>
77  </functionSet>
78</functionModel>

As a result of these changes to the function.xml file, the errors from the previous compilation should now be corrected. However, these errors prevented further compilation and subsequent errors from occurring because too much function model information is missing. Therefore, you need to compile the application again to determine if there are any additional errors.

sc -T -M udf_at_work

After the compilation, a list of errors similar to the following could result. The first two messages indicate an error due to a missing header file while the other messages indicate a namespace error. You can handle these errors when you port the SPADE implementation-related files to SPL, as described in the next sections.

Listing 17. Error message
In file included
from src/operator/OrdersWithFinalPrice1.cpp:3:
Listing 18. Error message
src/operator/OrdersWithFinalPrice1.h:16:37: error:
CalculateStateTaxInline.h: No such file or directory
Listing 19. Error message
src/operator/OrdersWithFinalPrice1.cpp: In member function
‘virtual void SPL::_Operator::OrdersWithFinalPrice1::process(const
SPL::Tuple&, uint32_t)’:
Listing 20. Error message
src/operator/OrdersWithFinalPrice1.cpp:23: error: ‘::functions’
has not been declared
Listing 21. Error message
src/operator/OrdersWithFinalPrice1.cpp:23: error: ‘::functions’
has not been declared
Listing 22. Error message
src/operator/OrdersWithFinalPrice1.cpp:23: error: ‘::functions’
has not been declared

Porting header, implementation, and makefile files from SPADE

The next step in the porting process is to copy the various implementation-related files from SPADE into the SPL application directory structure, and then change their content as necessary to conform to the SPL environment. Complete the following steps.

  1. Create opt, opt/lib, and opt/include directories, and change directories to the opt directory.
    cd ~/migrationsamples/udf/myspl
    mkdir -p opt/lib
    mkdir -p opt/include
    cd opt
  2. Copy the C++ header files, implementation file, and makefile from the SPADE implementation directory to the current working directory.
    cp ~/migrationsamples/udf/opt/CalculateStateTax*.h .
    cp ~/migrationsamples/udf/opt/CalculateStateTax*.cpp .
    cp ~/migrationsamples/udf/opt/Makefile .
  3. Complete the steps in the following sections.

Porting the inline function file

  1. Edit CalculateStateTaxInline.h.
    my_editor CalculateStateTaxInline.h &
  2. Replace
    // include this to get access to spade functions and types
    #include <DPS/PH/DpsFunctions.h>

    with
    // Include this to get access to SPL functions and types
    #include <SPL/Runtime/Function/SPLFunctions.h>
  3. For the namespace and the end-of-namespace comment, replace TAX_CALCULATION with
    functions
  4. For the getShipmentTrackingNumber() function, replace DPS::hashCode with
    SPL::Functions::Utility::hashCode
  5. For the getShipmentTrackingNumber() function, replace DPS::absval with
    SPL::Functions::Math::abs
  6. Optionally, delete all of the @spadeudf lines in the file.
  7. Save and exit from my_editor.

Porting the library function files

  1. Edit CalculateStateTaxLibrary.h
    my_editor CalculateStateTaxLibrary.h &
  2. For the namespace and the end-of-namespace comment, replace TAX_CALCULATION_LIBRARY with
    functions
  3. Optionally, delete all of the @spadeudf lines in the file.
  4. Save and exit from my_editor.
  5. Edit CalculateStateTaxLibrary.cpp
    my_editor CalculateStateTaxLibrary.cpp &
  6. Replace #include <DPS/PH/DpsFunctions.h> with
    // Include this to get access to SPL functions and types
    #include <SPL/Runtime/Function/SPLFunctions.h>
  7. For the libGetStateTaxPercentage() function, the libGetTotalTax() function, and the libGetShipmentTrackingNumber() function, replace TAX_CALCULATION_LIBRARY with
    functions
  8. For the libGetShipmentTrackingNumber() function, replace DPS::hashCode with
    SPL::Functions::Utility::hashCode
  9. For the libGetShipmentTrackingNumber() function, replace DPS::absval with
    SPL::Functions::Math::abs
  10. Save and exit from my_editor.

Porting the makefile file

  1. Edit makefile.
    my_editor Makefile &
  2. Globally replace dst-pe-install with
    dst-spl-pe-install
  3. Save and exit from my_editor.

Finishing and testing the native function SPL application

The final step of the migration process is to compile and test the completed SPADE-to-SPL migration and porting solution. Complete the following steps.

  1. Compile the library and install include files that you updated in the previous section.
    make
  2. Compile the SPL application. This compilation should succeed, because you addressed all of the migration issues through fix-ups and porting.
    cd ..
    sc -T -M udf_at_work

    The compilation process automatically creates a data subdirectory for the SPL application under the directory where the migrated SPL source was placed.

  3. Copy the original source input file shopping_cart.dat from the SPADE data subdirectory into the SPL data subdirectory.
    cp ~/migrationsamples/udf/data/shopping_cart.dat data/
  4. Execute the standalone application.
    ./output/bin/standalone -d 0
  5. Check the contents of the data subdirectory to verify that the application generated the expected output files and that their sizes are not 0.
    ls -l data/*.result
  6. Optionally, further inspect the contents of the output files to verify that the application generated the expected output data, which should be similar to that of the original SPADE application.
    my_editor data/*.result &

Conclusion

This tutorial presented migration strategies for a user-defined function SPADE program. You can continue your education with all 5 parts of this tutorial series. As you tackle progressively more-difficult parts of the series, your effort will go up from checking on a few minor language differences (like for punctuation) to dealing with ancillary objects, such as function models and C++ implementation files for UDOPs and UBOPs. Parts 4 and 5 of this tutorial series demonstrate progressively more-complex migration procedures that incorporate different kinds of user-defined operators (UDOPs and UBOPs).

Keep in mind the fact that these tutorial examples can only serve as a general guide for how to approach the conversion of a SPADE program to SPL. In other words, each SPADE program is different, and what is encountered during the conversion process for each can vary widely. For example, in the course of converting many other SPADE programs, you might see any of the following:

  • Failures to produce any SPL source
  • Source that doesn't compile
  • Compiled programs that don't run properly
  • Running programs that don't produce the correct results

Of significant importance is the availability of known good results from the SPADE version of the program. Once the SPL program is running, even apparently successfully, be sure to compare the former results with the new. Unfortunately, the new results will not always be good. A divide-and-conquer approach works fairly well in these cases. Both SPADE and SPL programs are easily modified through the addition of temporary file sinks. In both the SPADE and SPL versions of the program, add a new file sink to some operator in the middle of the program to record its output stream. The new file sink produces either good or bad results, which narrows the area of your search to either the part of the program before or after this new file sink. This process, repeated just a few times, usually helps you focus in on the erroneous computation fairly quickly.

None of this should frighten you away from the translation tool. It is, by far, the easiest and fastest way for someone to go from having a SPADE program to having a remarkably similar program that's now expressed in the SPL idiom. Just stay flexible and be prepared to be an active part of the migration process.

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Big data and analytics on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Big data and analytics, Information Management
ArticleID=676633
ArticleTitle=Migrating InfoSphere Streams SPADE applications to Streams Processing Language, Part 3: Migrate SPADE user-defined function applications
publish-date=07182011