Batch deployment input details for R scripts in Watson Machine Learning

Follow these rules when you are specifying input details for batch deployments of R scripts.

Data type summary table:

Data Description
Type Data references
File formats Any

Data sources

Input or output data references:

  • Local or managed assets from the space
  • Connected (remote) assets: Cloud Object Storage and Storage Volumes

Notes:

If you are specifying input/output data references programmatically:

  • Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
  • You can specify the environment variables that are required for running the R Script as 'key': 'value' pairs in scoring.environment_variables. The key must be the name of an environment variable and the value must be the corresponding value of the environment variable.
  • The deployment job's payload is saved as a JSON file in the deployment container where you run the R script. The R script can access the full path file name of the JSON file by using the JOBS_PAYLOAD_FILE environment variable.
  • If input data is referenced as a local or managed data asset, the deployment service downloads the input data and places it in the deployment container where the R script runs. You can access the location (path) of the downloaded input data through the BATCH_INPUT_DIR environment variable.
  • For input data references (data asset or connection asset), downloading of the data must be handled by the R script. If a connected data asset or a connection asset is present in the deployment jobs payload, you can access it using the JOBS_PAYLOAD_FILE environment variable that contains the full path to the deployment job's payload that is saved as a JSON file.
  • If output data must be persisted as a local or managed data asset in a space, you can specify the name of the asset to be created in scoring.output_data_reference.location.name. As part of an R script, output data can be placed in the path that is specified by the BATCH_OUTPUT_DIR environment variable. The deployment service compresses the data to .zip format and upload it in the location that is specified in BATCH_OUTPUT_DIR.
  • If output data must be saved in a remote data store, you must specify the reference of the output data reference (for example, a data asset or a connected data asset) in output_data_reference.location.href. The R script must take care of uploading the output data to the remote data source. If a connected data asset or a connection asset reference is present in the deployment jobs payload, you can access it using the JOBS_PAYLOAD_FILE environment variable that contains the full path to the deployment job's payload that is saved as a JSON file.
  • If the R script does not require any input or output data references to be specified in the deployment job payload, then do not provide the scoring.input_data_references and scoring.output_data_references objects in the payload.
  • R Scripts are currently supported only with the default software spec default_r3.6; specifying a custom software specification is not supported.
  • Deploying a script to run on a Hadoop environment is not supported.

Parent topic: Batch deployment input details by framework