R language aggregates 2

This example uses the following file name: sampleuda.R.

This example finds the longest string and largest number in the given group and returns the sum of the numbers and the length of the string.

All examples are run on a standard R data set with the name iris. You can load this data set into the Netezza appliance by using the Netezza R Library client-side package function as.nz.data.frame(). For more details, see the Netezza Package for R Developer's Guide.

Code

Enter the following code in the /tmp/sampleuda.R file:

nz.mode <- 'aggregate'
nz.init <- function () {
setOutputString(0, '')
setOutputDouble(1, 0)
}
nz.accum <- function () {
inputstr <- getInputColumn(0)
statestr <- getOutputColumn(0)
if (nchar(statestr) < nchar(inputstr))
setOutputString(0, inputstr)
inputnum <- getInputColumn(1)
statenum <- getOutputColumn(1)
if (inputnum > statenum)
setOutputDouble(1, inputnum)
}
nz.merge <- function () {
inputstr <- getInputColumn(0)
inputnum <- getInputColumn(1)
# after running nz.accum there is no possibility to
# get a NULL value in the input state - therefore,
# if NULL is encountered, we can skip this value set
# since it must to be coming from an empty data slice
if (is.null(inputstr) || is.null(inputnum))
return()
statestr <- getOutputColumn(0)
statenum <- getOutputColumn(1)
if (nchar(statestr) < nchar(inputstr))
setOutputString(0, inputstr)
if (inputnum > statenum)
setOutputDouble(1, inputnum)
}
nz.final <- function () {
setOutputDouble(0, nchar(getInputColumn(0)) + getInputColumn(1))
}

Compilation

Compile and register the UDA:

/nz/export/ae/utilities/bin/compile_ae --language r --template compile \
--version 3 --db dev --user nz /tmp/sampleuda.R

Registration

/nz/export/ae/utilities/bin/register_ae --language r --template uda \
--version 3 --db dev --user nz \
--sig 'ruda(VARCHAR(255),DOUBLE)' \
--return 'DOUBLE' --state '(VARCHAR(255),DOUBLE)' --exe sampleuda.R

Running

The following example shows an invocation:

SELECT ruda('text',2);
RUDA
------
6
(1 row)

When the data is distributed among at least two data slices, the result is noteworthy:

SELECT DISTINCT(datasliceid) FROM nza..iris;
DATASLICEID
-------------
4
3
1
2
(4 rows)

This result indicates that the iris table is stored on data slices 1-4. The UDA is then invoked as any other SQL aggregate:

SELECT ruda(CLASS,SEPALLENGTH) FROM nza..iris;
RUDA
------
17.9
(1 row)

You can validate the result by using the following two queries.

SELECT MAX(sepallength) FROM nza..iris;
MAX
-----
7.9
SELECT MAX(LENGTH(class)) FROM nza..iris;
MAX
-----
10

The queries show the same result, a sum of 17.9.