IBM Streams 4.2.1

Operator RScript

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.rproject/op$com.ibm.streams.rproject$RScript.svg

The RScript operator maps input tuple attributes to objects that can be used in R commands. It then runs a script that contains R commands and maps the objects that are output from the script to output tuple attributes.

The RScript operator processes one tuple at a time. When a tuple is received on the required input port, the operator maps the input tuple attributes, which are specified in the streamAttributes parameter, to the objects that are specified in the rObjects parameter. The operator runs the script that is specified in the rScriptFileName parameter and processes the results. The operator uses the custom output function fromR to map the values that are produced by the output statements in the R script to output tuple attributes.

You can optionally provide a script for initializing the R environment. There is also an optional input port that you can use to dynamically refresh the analytic code in the initialization or processing scripts. You can use the optional error port to monitor for errors that occur during the processing of a tuple.

Behavior in a consistent region

  • The RScript operator can participate in a consistent region.
  • The operator cannot be the start of a consistent region.
  • The control port (input port 1) is not considered during checkpointing or resetting.
  • On checkpoint, the operator saves the current R environment to a file (*.rdat) in data directory. This filename is saved into the checkpoint.
  • On reset, the operator gets the R environment filename from the checkpoint, and restores the R session with the environment saved in the file.
  • On retire checkpoint, the operator deletes the R environment file from the data directory. If the operator crashes, or some unexpected error occurs, the R environment files could be left behind in the data directory.

Exceptions

If the operator initialization fails with an unrecoverable error, the operator throws a RScriptException, which is based on std::exception and causes the processing element (PE) to stop.

If an error occurs while the operator is processing a tuple, the failedTuples metric is incremented. If the optional error port is specified, an error tuple is written to the port. If the optional port is not configured, the error is logged.

Tip: The operator detects errors by using a tryCatch() function when it runs the R scripts. If you want to generate more error messages, you can use functions such as stop() within your R scripts. For example:

if (in1 == 1) stop("The in1 object contains the value 1, which is invalid.");
out1 <- in1
out2 <- in2 * 2

If an exception occurs while the operator is running the R script during tuple processing, the operator captures any error information that is included with the exception.

When the trace level is set to debug, the operator can log the information that is returned from stderr and stdout of the process that is running R.

Examples

Summary

Ports
This operator has 2 input ports and 2 output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 5 parameters.

Required: rObjects, rScriptFileName, streamAttributes

Optional: initializationScriptFileName, rCommand

Metrics
This operator reports 1 metric.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The RScript operator has one required input port.

The required input port provides tuples that contain the attributes that are used as input for the R script, as specified in the streamsAttributes parameter. The required input port is non-mutable and its punctuation mode is Oblivious.

Properties

Ports (1)

The RScript operator has one optional input port.

The optional input port accepts a rstring attribute that specifies the path name of an R script. The path must be an absolute path. To specify a file within your toolkit, use "getThisToolkitDir()+path_to_script".

The script is run once. You can use the script to update or replace the analytic code in the initialization or processing scripts. For example, you can run R commands that refresh the model that is used for scoring or you can replace an R function definition.

Properties

Output Ports

Output Functions
ROutputs
<any T> T fromInput()

Default method for returning arguments from input tuple as listed on output port.

<any T> T fromR(rstring)

Return output attributes from R objects that are created in the R script.

Ports (0)

The RScript operator has one required output port.

The required output port is non-mutating and its punctuation mode is Preserving. Attributes from the input tuple are passed to the output tuple if they exist and extra attributes can be populated by using the output function.

Assignments
This port set allows any SPL expression of the correct type to be assigned to output attributes.

Properties

Ports (1)

The RScript operator has one optional output port.

The optional output port submits a tuple when an error occurs while the operator is running the script that is specified in the rScriptFileName parameter. The resulting tuple can contain up to two attributes. Both attributes are optional. The first attribute of type list<rstring> contains any error information that the operator captures from the failed operation. The second attribute is an embedded tuple that contains all the attributes from the input tuple.

Assignments
This port set requires that assignments made to output attributes must evaluate at compile-time to a constant.

Properties

Parameters

Required: rObjects, rScriptFileName, streamAttributes

Optional: initializationScriptFileName, rCommand

initializationScriptFileName

This optional parameter specifies the path to the R script that is run during the initialization of the operator. The recommended location for storing this file is in the etc directory in the toolkit. If a relative path is specified, the path is relative to the application directory.

Properties

rCommand

This optional parameter specifies the command that is used to start the R program. The default value is /usr/bin/R –-vanilla.

Properties

rObjects

This mandatory parameter specifies a list of rstring values, which represent the names of objects that must be populated before the R script is run. The data types for the objects must be compatible with the data types for the corresponding expression fields in the streamAttributes parameter. The rObjects parameter must also have the same number of elements as the streamAttributes parameter.

Properties

rScriptFileName

This mandatory parameter specifies the path to the R script that is run for each incoming tuple. The recommended location for storing this file is in the etc directory in the toolkit. If a relative path is specified, the path is relative to the application directory.

Properties

streamAttributes

This mandatory parameter specifies a list of expressions. Each expression must produce a value that can be passed to the R script as an input value and its data type must be compatible with the matching field in the rObjects parameter. There must be a one-to-one mapping between the entries in this list and the entries that are specified in the rObjects parameter.

Properties

Code Templates

RScript
stream<${schema}> ${outputStream} = RScript(${inputStream}) {
            param
                rScriptFileName : "${filename}";
                streamAttributes : ${attributes} ;
	        rObjects : "${objects}";
            output
                ${outputStream} : ${outputAttribute} = fromR("${object}");

        }
      

Metrics

failedTuples - Counter

The number of input tuples that result in a failure when the R script runs.