R language output mode

The UDX output signature is the definition of a specific function or aggregate result. It can be a scalar (UDF/UDA) or a table (UDTF).

The output signature must be known before a UDX invocation, which might differ from what typically takes place in R. A typical R use case is the apply function that accepts a data.frame object, a margin specifying the order of applying, and the function that is to be applied. The output can be a vector, a matrix, or a list, depending on the actual output of the user-provided function.
apply(iris, 1, function(x) length(x))
apply(iris[,1:4], 1, function(x) c(length(x),sqrt(as.double(x))))

In the R Adapter, the output mode is controlled by the OUTPUT_TYPE environment variable that you set during registration.

To set this variable, add the following lines to the command line of the register_ae script:
  • For sparse mode, add:
    --define "r_ae_output_type=SPARSE"
  • For table mode, add:
    --define "r_ae_output_type=TABLE"

Sparse Output Mode

Generally, the output of the user-provided function cannot be restricted to any predefined form. In the sparse output mode, the R AE returns a table of the definition TABLE(columnid INT4, valueVARCHAR(16000)), which means that each R AE output column is converted to a character string. If the original value is to be retrieved, you must cast each value to the wanted data type manually. However, you should avoid this practice because it might cause extra rounding errors and affect performance, especially for large data sets.

In this mode, there is no difference between returning output data with a general setOutput function and specific setOutput<DataType> functions, where <DataType> is a placeholder for a specific data type identifier. All output data is eventually stored as a character string.

Table Output Mode

The alternative table output mode requires one of the following settings for the output signature that is provided during the registration step:
  • Exact setting that specifies columns with their data types.
  • Set to TABLE(ANY) for which you must define a shaper function that specifies the output signature at run time.

In the table output mode, avoid the setOutput function. Instead, you use specific setOutput<DataType> functions, where <DataType> is a placeholder for a specific data type identifier.