Creating environment definitions (Watson Studio)

You can create custom environment definitions if you do not want to use the defaults provided by Watson Studio.

To create an environment definition, you must have the Admin or Editor role within the project.

You can create environment definitions in Watson Studio to run the following assets:

Notebooks in the notebook editor
Notebooks in JupyterLab
Notebooks in RStudio
Models created in the Model builder
Model flows in the Flow editor
Data Refinery flows
Jobs that run operational assets, such as Data Refinery flows, SPSS Modeler flows, or Notebooks in a project

To create an environment definition:

From the Environments tab in your project, click New environment definition.
Enter a name and a description.
Select the type. The type specifies the runtime engine type. This can be:
- Default: Select for Python or R, RStudio, or JupyterLab runtimes.
- Spark: Select for Spark with Python, R, or Scala runtimes.
- GPU: Select for more computing power to improve model training performance.
- Remote system: Select to:
  - Run Data Refinery jobs to refine data stored in HDFS, in tables in a Hive warehouse, or in tables in Impala on the Hadoop cluster
  - Run jobs or Jupyter Enterprise Gateway (JEG) sessions on remote systems, such as Hadoop or Spectrum Conductor (JEG only).
For Default or GPU, select the hardware configuration and software version.
- Specify the size for CPU, GPU and RAM to reserve.
  
  The environment is started on a compute node where the required resources are available and the resources are reserved for the environment for as long as it runs. You should be careful to specify enough resources for your planned workload, especially sufficient memory. This is important when running notebooks. 2 GB RAM is the default.
  
  Although specifying the amount of resources can provide a more predictable experience, it can be difficult to predict what a reasonable limit is, which can lead to situations where all the resources are reserved by active environments but aren’t being actively used.
- Specify the default software version.
  
  Note: If you are creating scikit-learn, XGBoost, PyTorch, TensorFlow, Keras, or Caffe models, or are coding Python functions or scripts, select Default Python 3.7. The Default Python 3.7 (legacy) software version contains older versions of these machine learning libraries.
For Spark, select the driver and executor size, the number of executors and the software version.
- Driver hardware configuration. The driver creates the SparkContext which distributes the execution of jobs on the Spark cluster. Select from:
  - 1 vCPU and 4 GB RAM
  - 2 vCPU and 8 GB RAM
- Executor hardware configuration. The executor is the process in charge of running the tasks in a given Spark job. Select from:
  - 1 vCPU and 4 GB RAM
  - 2 vCPU and 8 GB RAM
- Number of executors. Select from 1 to 10 executors.
- Spark version. Select from:
  - Spark 3.0
- Software version
For Remote system, select a Hadoop or system configuration.

Your new environment definition is listed under Environment definitions on the Environments page of your project. From this page, you can update an environment definition and see which runtimes are active. You can also stop runtimes from here.

Limitations

Notebook environments (Anaconda Python or R distributions):

You can’t add a software customization to the default Python and R environment definitions included in Watson Studio. You can only add a customization to an environment definition that you create.
To create a Python with GPU environment, the Jupyter notebooks with Python 3.7 for GPU service must be installed.
If you create your own environment and want to customize the software configuration of your environment using conda, you must have at least 2 GB RAM.
Conda installs the pip packages first.
You can’t customize an R environment by installing R packages directly from CRAN or GitHub. You can only check if the CRAN package you want is available from conda channels and, if the package is available, add that package name in the customization list as r-<package-name>.
After you have started a notebook in an Watson Studio environment, you can’t create another Conda environment from inside that notebook and use it. Watson Studio environments do not behave like a Conda environment manager.

JupyterLab environments:

If you want to add a software customization to an environment definition in JupyterLab and want to use the same environment to run your notebooks in a job, you must create a custom environment definition with Python 3.7 or 3.6 as software version. If you select JupyterLab with Python 3.7 or 3.6 as the software version, the environment definition will not be selectable when you create a job.

Spark environments:

You can’t customize the software configuration of a Spark environment definition.

GPU environments:

The number of GPU runtimes you can have active at any time can’t exceed the number of GPU units in your cluster.

Promoting an environment definition

If you created an environment definition and associated it with an asset that you promoted to a deployment space, you can also promote the environment definition to the same space. Promoting the environment definition to the same space enables running the asset in the same environment that was used in the project.

You can only promote environment definitions that you created.

To promote an environment definition:

From the Environments page in your project, select the environment definition and click Actions > Promote.
Select the space that you promoted your asset to as the target deployment space and optionally provide a description.

Next steps

Learn more

Create a GPU environment definition