Batch deployment input details for R scripts in Watson Machine Learning
Follow these rules when you are specifying input details for batch deployments of R scripts.
Data type summary table:
| Data | Description |
|---|---|
| Type | Data references |
| File formats | Any |
Data sources
Input or output data references:
- Local or managed assets from the space
- Connected (remote) assets in cloud storage and Storage Volumes
Notes:
- For cloud storage connections such as Cloud Object Storage or Cloud Object Storage (infrastructure), you must configure Access key and Secret key, also known as HMAC credentials.
If you are specifying input/output data references programmatically:
- Data source reference
typedepends on the asset type. Refer to the Data source reference types section in Adding data assets to a deployment space. - You can specify the environment variables that are required for running the R Script as
'key': 'value'pairs inscoring.environment_variables. Thekeymust be the name of an environment variable and thevaluemust be the corresponding value of the environment variable. - The deployment job's payload is saved as a JSON file in the deployment container where you run the R script. The R script can access the full path file name of the JSON file by using the
JOBS_PAYLOAD_FILEenvironment variable. - If input data is referenced as a local or managed data asset, the deployment service downloads the input data and places it in the deployment container where the R script runs. You can access the location (path) of the downloaded input data
through the
BATCH_INPUT_DIRenvironment variable. - For input data references (data asset or connection asset), downloading of the data must be handled by the R script. If a connected data asset or a connection asset is present in the deployment jobs payload, you can access it using the
JOBS_PAYLOAD_FILEenvironment variable that contains the full path to the deployment job's payload that is saved as a JSON file. - If output data must be persisted as a local or managed data asset in a space, you can specify the name of the asset to be created in
scoring.output_data_reference.location.name. As part of an R script, output data can be placed in the path that is specified by theBATCH_OUTPUT_DIRenvironment variable. The deployment service compresses the data to .zip format and upload it in the location that is specified inBATCH_OUTPUT_DIR. - If output data must be saved in a remote data store, you must specify the reference of the output data reference (for example, a data asset or a connected data asset) in
output_data_reference.location. The R script must take care of uploading the output data to the remote data source. If a connected data asset or a connection asset reference is present in the deployment jobs payload, you can access it using theJOBS_PAYLOAD_FILEenvironment variable that contains the full path to the deployment job's payload that is saved as a JSON file. - If the R script does not require any input or output data references to be specified in the deployment job payload, then do not provide the
scoring.input_data_referencesandscoring.output_data_referencesobjects in the payload. - R Scripts are currently supported only with the default software spec
default_r3.6; specifying a custom software specification is not supported. - Deploying a script to run on a Hadoop environment is not supported.