Creating environment definitions (Watson Studio)

To create an environment definition, you must have the Admin or Editor role within the project.

You can create environment definitions in Watson Studio for:

To create an environment definition:

  1. From the Environments tab in your project, click New environment definition.
  2. Enter a name and a description.
  3. Select the type. The type specifies the runtime engine type. This can be:
    • Default: Select for Anaconda (Python or R), RStudio, or JupyterLab runtimes.
    • Spark: Select for Spark with Python, R, or Scala runtimes.
    • GPU: Select for more computing power to improve model training performance.
    • Hadoop: Select to:
      • Run Data Refinery jobs to refine data stored in HDFS, in tables in a Hive warehouse, or in tables in Impala on the Hadoop cluster
      • Use the Jupyter Enterprise Gateway (JEG) service to run jobs
  4. For Default, select:

    • Elastic if you want Watson Studio to decide and adjust the hardware configuration sizes depending on the compute resources required.

      An Elastic environment consumes any amount of resources that are available on the compute node it is running on. The difference between an elastic environment and an environment for which you have specified the configuration sizes is that no fixed amount of resources are reserved on a compute node while an environment is idle. However, although there is no preset limit on the amount of resources an environment can consume, there is no guarantee that enough resources are available. For example, if two notebooks are running on a compute node and you start a third notebook, this notebook can only consume what remains on the compute node.

    • Specify to specify the size for CPU, GPU and RAM to reserve.

      The environment is started on a compute node where the required resources are available and the resources are reserved for the environment for as long as it runs. You should be careful to specify enough resources for your planned workload, especially sufficient memory. This is important when running notebooks. 2 GB RAM is the default when switching from Elastic to Specify.

      Although specifying the amount of resources can provide a more predictable experience than selecting Elastic, it can be difficult to predict what a reasonable limit is, which can lead to situations where all the resources are reserved by active environments but aren’t being actively used.

    • Specify the default software version. Either Default Python 3.7, Default Python 3.6, Default R 3.6, RStudio 3.6 or Default JupyterLab.

      Note: If you are creating scikit-learn, XGBoost, PyTorch, TensorFlow, Keras, or Caffe models, or are coding Python functions or scripts, select Default Python 3.7 or Default Python 3.6.

  5. For Spark, select the driver and executor size, the number of executors and the software version.
    • Driver hardware configuration. The driver creates the SparkContext which distributes the execution of jobs on the Spark cluster.
      • 1 vCPU and 4 GB RAM
      • 2 vCPU and 8 GB RAM
    • Executor hardware configuration. The executor is the process in charge of running the tasks in a given Spark job.
      • 1 vCPU and 4 GB RAM
      • 2 vCPU and 8 GB RAM
    • Number of executors: Select from 1 to 10 executors.
    • Spark version: Spark 2.3 and Spark 2.4 (select Spark 2.4 for Data Refinery environment definitions)
    • Software version:
      • Scala 2.11 (select for models created by using the model builder or for model flows in the flow editor)
      • Python 3.7 or 3.6
      • R 3.6 (select for Data Refinery environment definitions)
  6. For GPU, specify the GPU, CPU and RAM sizes. The default size is 1 GPU, 1 vCPU and 2 GB RAM.
  7. For Hadoop, select the cluster.

Your new environment definition is listed under Environment definitions on the Environments page of your project. From this page, you can update an environment definition and see which runtimes are active. You can also stop runtimes from here.

Limitations

Notebook environments (Anaconda Python or R distributions) have these limitations:

Spark environments have these limitations:

GPU environments have these limitations:

Next steps