Creating a UDAF written in R

The example in this section shows how to create, on a Db2® system, a user-defined aggregate function (UDAF) written in R.

The following example creates a scalar function called average that calculates the average of several integers.

Code

Add the following code to the $DB2_HOME/function/routine/average.R file.

nz.mode <- 'aggregate'

nz.init <- function() {
  setOutputNull(0)
  setOutputNull(1)
}

nz.accum <- function() {
  input <- getInputColumn(0)
  if (is.null(input))
    return()

  sum <- getOutputColumn(0)
  count <- getOutputColumn(1)
  if (is.null(count)) {
    setOutputInt64(1, 1)
  } else {
    setOutputInt64(1, count + 1)
  }
  if (is.null(sum)) {
    setOutput(0, input)
  } else {
    setOutput(0, sum + input)
  }
}

nz.merge <- function() {
  inputSum <- getInputColumn(0)
  inputCount <- getInputColumn(1)
  if(is.null(inputSum) || is.null(inputCount))
    return()

  outputSum <- getOutputColumn(0)
  if (is.null(outputSum))
    setOutput(0, inputSum)
  else
    setOutput(0, inputSum + outputSum)

  outputCount <- getOutputColumn(1)
  if (is.null(outputCount))
    setOutput(1, inputCount)
  else
    setOutput(1, inputCount + outputCount)
}

nz.final <- function () {
  sum <- getInputColumn(0)
  if (is.null(sum)) {
    setOutputNull(0)
    return()
  }

  count <- getInputColumn(1)
  if (is.null(count))
  {
    userError("NumberAverage.returnValue(): something is wrong.")
    return()
  }
  setOutputDouble(0, as.numeric(as.numeric(sum) / count))
}

Deployment

Deploy the UDX to a Db2 instance by issuing the following command, where path_to_average.R is to be replaced by the fully qualified path to the average.R file.

db2 "CREATE FUNCTION average(integer) returns double aggregate with (integer, bigint) language R parameter style \
NPSGENERIC  FENCED  NOT THREADSAFE  ALLOW PARALLEL  NO DBINFO  DETERMINISTIC \
NO EXTERNAL ACTION  CALLED ON NULL INPUT  NO SQL external name 'path_to_average.R' "

Running

It is assumed that there is a table 'test_int' with a single integer column 'i':

I          
-----------
          1
          2
          4
          8

Use this function in a query, for example:

db2 "select average(i) from test_int" 

The following result is returned:

1                       
------------------------
  +3.75000000000000E+000

  1 record(s) selected.