Important:

IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Batch deployment input details for R scripts in Watson Machine Learning

Follow these rules when you are specifying input details for batch deployments of R scripts.

Data type summary table:

Data Description
Type data references
File formats any

Data Sources

Input/output data references:

  • Local/managed assets from the space
  • Connected (remote) assets: Cloud Object Storage and Storage Volumes

Notes:

If you are specifying input/output data references programmatically:

  • Data source reference type depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space.
  • You can specify environment variables that are required for executing the R Script as 'key': 'value' pairs in scoring.environment_variables. The key must be the name of an environment variable and the value must be the corresponding value of the environment variable.
  • The deployment job's payload will be saved as a JSON file in the deployment container where the R script will be executed. The R script can access the full path file name of the JSON file by using the JOBS_PAYLOAD_FILE environment variable.
  • If input data is referenced as a local or managed data asset, deployment service will download the input data and place it in the deployment container where the R script will be executed. You can access the location (path) of the downloaded input data through the BATCH_INPUT_DIR environment variable.
  • For input data references (data asset or connection asset), downloading of the data must be handled by the R script. If a connected data asset or a connection asset is present in the deployment jobs payload, you can access it using the JOBS_PAYLOAD_FILE environment variable that contains the full path to deployment job's payload saved as a JSON file.
  • If output data must be persisted as a local or managed data asset in a space, you can specify the name of the asset to be created in scoring.output_data_reference.location.name. As part of R script, output data can be placed in the path that is specified by the BATCH_OUTPUT_DIR environment variable. Deployment service will compress the data to .zip format and upload it in the location that is specified in BATCH_OUTPUT_DIR.
  • If output data must be saved in a remote data store, you must specify the reference of the output data reference (for example, a data asset or a connected data asset) in output_data_reference.location.href. The R script must take care of uploading the output data to the remote data source. If a connected data asset or a connection asset reference is present in the deployment jobs payload, you can access it using the JOBS_PAYLOAD_FILE environment variable that contains the full path to deployment job's payload saved as a JSON file.
  • If the R script does not require any input or output data references to be specified in the deployment job payload, then do not provide the scoring.input_data_references and scoring.output_data_references objects in the payload.
  • R Scripts are currently supported only with the default software spec default_r3.6; specifying a custom software specification is not supported.
  • Deploying a script to run on a Hadoop environment is currently not supported.

Parent topic: Batch deployment input details by framework