Running user-defined functions
User-defined functions can run either on each row, or on each group of rows given a grouping column. The first case is covered by nzApply(), the second functionality is realized by the nzTAapply() function. There are also two more flexible functions, nzRun() and nzRunHost() that allow users to iterate through the data manually.
nzApply
nz.data.frame). For each processed row, it expects at
most one result row (vector, list) that is inserted into the output
mz.data.frame.data(iris)
if (nzExistTable('iris')) {nzDeleteTable('iris')}
d <-as.nz.data.frame(iris)
f <- function(x) { return(sqrt(x[[1]])) }
if (nzExistTable('apply_output')) nzDeleteTable('apply_output')
r <- nzApply(d[,1], NULL, f, output.name='apply_output',
output.signature=list(SQUAREROOT=NZ.DOUBLE))
head(r)
# SQUAREROOT
#1 2.645751
#2 2.626785
#3 2.366432
#4 2.366432
#5 2.366432
#6 2.366432
# this exists also as an overloaded apply method and the following
# returns the same result
nzDeleteTable('apply_output')
r <- apply(d[,1], NULL, f, output.name='apply_output',
output.signature=list(SQUAREROOT=NZ.DOUBLE))f <- function(x) { return(sqrt(as.numeric(x[[1]]))) }
if (nzExistTable('apply_output')) nzDeleteTable('apply_output')
r <- nzApply(d, NULL, f, output.name='apply_output',
output.signature=list(SQUAREROOT=NZ.DOUBLE))
head(r)
# SQUAREROOT
#1 2.258318
#2 2.213594
#3 2.213594
#4 2.258318
#5 2.258318
#6 2.258318nzTApply
nz.data.frame). The subsets are
determined by a specified index column. The results of applying the functions are put into a data
frame. In the example below, the same nz.data.frame as in the
nzApply() example is used. The example contains the iris data
set.print(d)
#SELECT Sepal_Length,Sepal_Width,Petal_Length,Petal_Width, Species FROM nziris
# the following lines do the same - compute the mean value
# in every group
nzTApply(d, d[,5], mean)
nzTApply(d, 'Species', mean)
nzTApply(d, 5, mean)
# Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species
#1 6.588 2.974 5.552 2.026 nan virginica
#2 5.006 3.428 1.462 0.246 nan setosa
#3 5.936 2.770 4.260 1.326 nan versicolorDetails
The output of these functions depends on whether output.name and
output.signature are specified. For nzApply(), an object of
class data.frame is returned. The object has the same number of columns as the
sequences that are returned from fun. If the output.name is not provided, no table
is created. For nzTApply(), if an output.name is provided, the
output.signature must also be specified. The output.signature
parameter can be used to avoid receiving a sparse table and to set the desired output columns types;
if the parameter is provided, fun must return values that can be cast to these
types.
If the fun function causes errors, the debugger mode can be used to
investigate conditions where errors occur. When debugger.mode=TRUE, then the result
table is not stored in the Netezza system. Instead, for every group a diagnostic test is called, and
the environment for the first group that causes an error is transported to the local R client and
opened in the R debugger.
nziris = nz.data.frame('iris')
FUN5 = function(x) {
if(min(x[,1]) < 4.5) cov(0) else min(x[,1])
} nzTApply(nziris, 5, FUN5, debugger.mode=T)While in debug mode, the function
nzTApply() returns a summary for group processing. This summary is presented in
a table with the following columns:- The first column contains the outcome or error description.
- The second column contains the type of outcome (try-error in case of error).
- The third column contains the group name for which the given result is returned In this example, there are three groups, where one group produces an error.
Found 1 error
values type group
1 101 integer virginica
2 supply both 'x' and 'y' or a matrix-like 'x' try-error setosa
3 51 integer versicolorThen, for the first group that caused an error, a dumped
environment is downloaded from the remote SPU to the R client and opened in the R
debugger.nzApply(X, MARGIN, FUN, output.name = NULL, output.signature =
NULL, clear.existing = FALSE, ...)
nzTApply(X, INDEX, FUN = NULL, output.name = NULL, output.signature = NULL,
clear.existing = FALSE, debugger.mode = FALSE, ..., simplify = TRUE) Where:- X
- Specifies the input data frame.
- MARGIN
- Currently not used but the argument is required; NULL must be passed.
- FUN
- Specifies the user-defined function.
- output.name
- Specifies the name of the output table created on the Netezza system.
- output.signature
- Denotes the data types for output table columns. If not provided, a generic (sparse) table is created.
- clear.existing
- If
TRUE, delete the output table if it currently exists. - debugger.mode
- Ii
TRUE, nzTApply works in debugger mode. - ...
- These arguments are passed to fun.
- simplify
- Not used, included for compatibility.
- INDEX
- The value used to index the data set where
INDEXmay be supplied as of the following items:- A character string the value of which must be present among columns of
X. - An integer not greater than the number of columns of
X.
- A character string the value of which must be present among columns of