Analytic executables

Analytic Executables (AEs) are powerful tools for analytics development and processing. While Netezza's UDX functionality enables user-supplied C++ code callable from SQL functions, AEs are freestanding executable programs that can be called from SQL functions.

Like SQL functions and UDXs, there are different types of AEs to handle different function types: function and aggregate. That is, scalar and table SQL functions are used to invoke their respective function AEs, while aggregate SQL functions are used to invoke aggregate AEs.

AEs are unique in that they can be written in any language with the ability to call C libraries, including C, C++, Java, Python, Fortran, Perl, and R. For R installation information, see the IBM developerWorks Netezza Developer Network (NDN) community. You need to register first at developerWorks https://www.ibm.com/developerWorks. Search for “NDN” to locate the Netezza Developer Network community. Follow the instructions in the overview page to get access to the private part of the community. When an AE is written, it can be published and registered as a function on the nodes of a supporting IBM Netezza appliance. It can then be executed to carry out its intended purpose. An AE can operate either on the host or on the parallel nodes of the appliance. The figure highlights this process.
Figure 1. AE Operation on Host and S-Blades (SPUs)AE operation on host and s-blades (SPUs)
AEs provide a number of new capabilities in the development of analytics. Benefits of using AEs include:
  • Providing a method for diverting the data stream for a specific operation or purpose
  • Freedom to perform algorithmic operations on the data stream
  • Flexibility to write functions in C, C++, Java, Fortran, Perl, Python, and R
  • Allowing significantly computationally intensive algorithms where the data resides
  • Use of libraries or languages that are not part of the NPS
While the Netezza system already supports calls to user C++ functions that are called within an Netezza system process, AEs allow SQL functions to invoke application code in non-Netezza system processes running on the host and SPUs. Therefore, AEs provide more flexibility to build applications inside the Netezza system, including the ability to select programming languages and third-party software libraries that best achieve objectives.

AEs also provide greater control over the application lifespan. The lifespan of a UDX is controlled by the NPS system and is always less than the lifespan of a query. The lifespan on an AE can optionally be controlled by the Netezza system or by the AE itself, and can be indefinite.

Function AEs

A function AE generally uses a file I/O or “standard input/output” data flow paradigm. A SQL scalar function or table function that returns exactly one output row per input row and whose return row contains exactly one column has the following general structure, which varies depending on the programming language:
getDataConnectionToNPS();
while (getNextInputRow)
{
getInputColumns();
setOutputColumn(); // only one
outputResultRow();
}
closeDataConnection();
The same AE can be invoked from a SQL scalar function and a table function. Some AE programming language implementations also support a callback paradigm that can be used only when returning exactly one output row per input row. Often, SQL table functions return more than one output row per input row and define an output row with more than one column. In this case, an AE has the following general structure:
getDataConnectionToNPS();
while (getNextInputRow)
{
getInputColumns();
limit = { 0 or a positive integer, app specific };
for (int i = 0; i < limit; i++)
{
setOutputColumns();
outputResultRow();
}
}
limit = { 0 or a positive integer, app specific };
for (int i = 0; i < limit; i++)
{
setOutputColumns();
outputResultRow();
}
closeDataConnection();
Notice that there can be more than one column per output row and any number of output rows per input row. After the end of input, the AE can return output rows that are not associated with any input row. This type of AE can only be invoked by a SQL table function.

Aggregate AEs

An aggregate AE uses a different API than a function AE. This difference exists because the structure of an aggregation algorithm is different. An aggregate AE has the following general structure, which varies depending on the programming language:
getDataConnectionToNPS();
while (getNextMessage())
{
case INTIALIZE:
initialize aggregation state
break;
case ACCUMULATE:
optionally set aggregation state based on application algorithm
break;
case MERGE:
merge results from different data
slices break;
case FINAL_RESULT:
set final result based on application
algorithm break;
}
closeDataConnection();
Languages such as C++ and Java represent this structure as callbacks, for instance, an initialize function is called on initialization, an accumulate function is called on accumulate, and so on.