Batch deployment input details for R scripts in Watson Machine Learning
Follow these rules when you are specifying input details for batch deployments of R scripts.
Data type summary table:
Data | Description |
---|---|
Type | Data references |
File formats | Any |
Data sources
Input or output data references:
- Local or managed assets from the space
- Connected (remote) assets: Cloud Object Storage and Storage Volumes
Notes:
- For connections of type Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
If you are specifying input/output data references programmatically:
- Data source reference
type
depends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - You can specify the environment variables that are required for running the R Script as
'key': 'value'
pairs inscoring.environment_variables
. Thekey
must be the name of an environment variable and thevalue
must be the corresponding value of the environment variable. - The deployment job's payload is saved as a JSON file in the deployment container where you run the R script. The R script can access the full path file name of the JSON file by using the
JOBS_PAYLOAD_FILE
environment variable. - If input data is referenced as a local or managed data asset, the deployment service downloads the input data and places it in the deployment container where the R script runs. You can access the location (path) of the downloaded input data
through the
BATCH_INPUT_DIR
environment variable. - For input data references (data asset or connection asset), downloading of the data must be handled by the R script. If a connected data asset or a connection asset is present in the deployment jobs payload, you can access it using the
JOBS_PAYLOAD_FILE
environment variable that contains the full path to the deployment job's payload that is saved as a JSON file. - If output data must be persisted as a local or managed data asset in a space, you can specify the name of the asset to be created in
scoring.output_data_reference.location.name
. As part of an R script, output data can be placed in the path that is specified by theBATCH_OUTPUT_DIR
environment variable. The deployment service compresses the data to .zip format and upload it in the location that is specified inBATCH_OUTPUT_DIR
. - If output data must be saved in a remote data store, you must specify the reference of the output data reference (for example, a data asset or a connected data asset) in
output_data_reference.location.href
. The R script must take care of uploading the output data to the remote data source. If a connected data asset or a connection asset reference is present in the deployment jobs payload, you can access it using theJOBS_PAYLOAD_FILE
environment variable that contains the full path to the deployment job's payload that is saved as a JSON file. - If the R script does not require any input or output data references to be specified in the deployment job payload, then do not provide the
scoring.input_data_references
andscoring.output_data_references
objects in the payload. - R Scripts are currently supported only with the default software spec
default_r3.6
; specifying a custom software specification is not supported. - Deploying a script to run on a Hadoop environment is not supported.
Parent topic: Batch deployment input details by framework