Migrating user-defined extensions (UDXs) based on C++ and Lua from IBM PureData® System for Analytics (Netezza) to Db2 Warehouse

To migrate your PureData® System for Analytics user-defined functions (UDFs), user-defined aggregates (UDAs), or user-defined shared libraries that are written in C++ or LUA, you must recompile and reregister them in the Db2® Warehouse environment.

About this task

In PureData System for Analytics, you compile and register the UDX library by using the nzudxcompile command, which is executed directly in the Netezza shell. After the compilation and registration are completed successfully, the libraries for the Snippet Processing Units (SPUs) and the host are generated and copied to the SPUs. Then, the function is registered.

On Db2 Warehouse, the command line processor plus (CLPPlus) takes the package with one or more source files and optional other content, uploads it to your project directory on the server, creates the folder for the specified project, unpacks the package there, and takes necessary steps, such as file compilation and linking or registering the function.

The following example shows a UDX in C++ that runs on PureData System for Analytics by recompiling and deploying it on theDb2 Warehouse system.

Procedure

  • To compile and register the same set of UDXs on Db2 Warehouse, each in a separate project, take the following steps:
    1. Add the following line (if it does not exist) in the udf_basic_example.cpp file:
      #include <limits.h>

      This header file was implicitly included with dependencies of udxinc.h on PureData System for Analytics, but not in the updated version on Db2 Warehouse.

    2. Create packages for the UDXs by entering the following commands:
      tar -czf udsf.tar.gz udf_basic_example.cpp
      tar -czf udaf.tar.gz uda_basic_example.cpp
      tar -czf udtf.tar.gz udtf_basic_example.cpp
      Note: Standard library headers from one compiler version to another might have changed due to tightening name spaces, implicit header inclusions, and stricter type checking. Therefore, you might have to fine-tune #include for headers, and using for namespace or symbol.
    3. Connect to the Db2 Warehouse server by using CLPPlus:
      clpplus -nw <user>/<password>@<host or ip of server>:50000/bludb
    4. To compile the UDXs, use the ida deployudx CLPPlus command on Db2 Warehouse:
      SQL> ida deployudx project udsfProject source udsf.tar.gz class 
      CMyProduct signature "myproduct(integer, integer)" return integer
      SQL> ida deployudx project udafProject source udaf.tar.gz class 
      CAvg signature "myavg(integer) aggregate with (double, integer)" 
      return double
      The state is provided by using the aggregate with (double, integer) signature parameter on Db2 Warehouse.
      SQL> ida deployudx project udtfProject source udtf.tar.gz class 
      OneUdtf signature "myudtf(integer)" return "table(res integer)"

      There is no version parameter because Db2 Warehouse supports UDXs only in version 2. To port UDXs of version 1, follow the steps that are described in Migrating UDXs from API version 1 to API version 2. Then, deploy and retest the ported libraries on PureData System for Analytics before you use them on Db2 Warehouse.

      The ida deployudx command automatically determines the language based on the extensions of the files contained in the compressed input file. However, if the input file contains more than one type of UDX source file (for example, both .cpp and .nzl files), specify the LANG parameter, as in the following example:
      IDA DEPLOYUDX PROJECT udsfProject SOURCE udsf.tar.gz LANG CPP
      For more information on ida deployudx command, see https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.clpplus.doc/doc/r0061924.html.
  • To compile and register the same set of functions as a single project, take the following steps:
    1. Add the following line (if it does not exist) in the udf_basic_example.cpp file:
      #include <limits.h>

      In the example, each function is registered in its own project. However, all UDXs can be deployed within a single package, which is compiled to a single .so file, and then be registered independently.

      tar -czf allinone.tar.gz udf_basic_example.cpp 
      uda_basic_example.cpp udtf_basic_example.cpp
    2. Connect to the Db2 Warehouse server by using CLPPlus:
      clpplus -nw <user>/<password>@<host or ip of server>:50000/bludb
    3. To upload, compile, and link the library, use the ida buildlibCLPPlus command on Db2 Warehouse:
      SQL> ida buildlib project allinone src allinone.tar.gz

      The information about the created library looks as follows:

      library Generated: release/liballinone.so

    4. To register the UDXs, use the ida registerudx CLPPlus command on Db2 Warehouse:
      SQL> ida registerudx project allinone udxlib release/liballinone.so 
      class CMyProduct signature "aio_myproduct(integer, integer)" return 
      integer
      SQL> ida registerudx project allinone udxlib release/liballinone.so 
      class CAvg signature "aio_myavg(integer) aggregate with (double, 
      integer)" return double
      SQL> ida registerudx project allinone udxlib release/liballinone.so 
      class OneUdtf signature "aio_myudtf(integer)" return "table(res 
      integer)"

Results

After the migration is completed, your UDXs are available on Db2 Warehouse.

You can now use these UDXs in queries in the same way as you did on PureData System for Analytics:
SQL> select col2, myavg(col1) from table1 group by col2;
SQL> select myproduct(col1, col2) from table1 where col2 < 100;
SQL> select table1.col1, tf.res from table1, table(myudtf(col1)) as tf;