Creating non-standard environment templates

If none of the default environments match your needs, you can create custom environment templates.

Required permissions
To create an environment template, you must have the Admin or Editor role within the project.

You can create environment templates in Watson Studio to run the following assets:

  • Notebooks in the Notebook editor
  • Notebooks in JupyterLab
  • Notebooks in RStudio
  • Models created in the Model builder
  • SPSS Modeler
  • Data Refinery flows
  • Jobs that run operational assets, such as Data Refinery flows, SPSS Modeler flows, or notebooks in a project
Note:

If you want additional packages to be automatically installed in your environment template (software customization), you can configure this after the custom environment template is created. For details, refer to Adding customizations.

To create an environment template:

  1. On the Manage tab of your project, select the Environments page and click New template under Templates.

  2. Enter a name and a description.

  3. Select the type. The type specifies the runtime engine type.

    • Default: Select for Python or R, RStudio, or JupyterLab runtimes.
    • Spark: Select for Spark with Python or R runtimes.
    • GPU: Select for more computing power to improve model training performance.
    • Remote system: Select to:
      • Run Data Refinery jobs to refine data stored in HDFS, in tables in a Hive warehouse, or in tables in Impala on the Hadoop cluster
      • Run jobs or Jupyter Enterprise Gateway (JEG) sessions on remote Hadoop systems
  4. For Default or GPU, select the hardware configuration and software version.

    • Specify the size for CPU, GPU, and RAM to reserve.

      The environment is started on a compute node where the required resources are available and the resources are reserved for the environment for as long as it runs. You should be careful to specify enough resources for your planned workload, especially sufficient memory. This is important when you run notebooks. 2 GB RAM is the default.

      Although specifying the amount of resources can provide a more predictable experience, it can be difficult to predict what a reasonable limit is, which can lead to situations where all the resources are reserved by active environments but aren't being actively used.

    • Specify the software version, for example Python, R, RStudio or JupyterLab with Python. If your administrator defined custom images, you can find them here, too.

  5. For Spark, select the driver and executor size, the number of executors, and the software version. You can also add environment variables and a Spark configuration.

    • Driver hardware configuration. The driver creates the SparkContext which distributes the execution of jobs on the Spark cluster. Select from:
      • 1 vCPU and 4 GB RAM
      • 2 vCPU and 8 GB RAM
    • Executor hardware configuration. The executor is the process in charge of running the tasks in a given Spark job. Select from:
      • 1 vCPU and 4 GB RAM
      • 2 vCPU and 8 GB RAM
    • Number of executors. Select from 1 to 50 executors.
    • Software version, for example, the Spark and Python version.
    • Optional: Environment variables. For example, SPARK_HOME=/usr/local/spark.
    • Optional: Spark configuration. For example, spark.executor.decommission.forceKillTimeout=60.
  6. For Remote system, select a Hadoop or system configuration.

Note:

In custom Spark configuration, you can't overwrite these variables and arguments:

Environment variables
PROJECT_ID, SPACE_ID, USER_ACCESS_TOKEN, USER_REFRESH_TOKEN, PROJECT_ACCESS_TOKEN, SPACE_ACCESS_TOKEN, APP_ENV_APSX_API, RUNTIME_ENV_NOTEBOOK, RUNTIME_ENV_REGION, RUNTIME_ENV_STOREFRONT, RUNTIME_ENV_APSX_URL, USER_ID, PROJECT_NAME, SPACE_NAME, HTTP_PROXY, http_proxy, HTTPS_PROXY, https_proxy, NO_PROXY, no_proxy

Spark configuration

  • spark.ui.reverseProxy
  • spark.eventLog.enabled

If the runtime is running as a job, and same custom environment variables are set in the environment metadata and job metadata, then environment variables from the job metadata have higher priority.

Where to find your custom environment template

Your new environment template is listed under Templates on the Environments page in the Manage tab of your project. From this page, you can:

  • Check which runtimes are active
  • Update custom environment templates
  • Stop active runtimes.

Limitations

  • The default environment templates cannot be edited or modified.

Notebook environments:

Note: You can't customize the software configuration of a Spark environment template.
  • You can't add a software customization to the Python and R environment templates included in Watson Studio. You can only add a customization to an environment template that you create.
  • To create a Python with GPU environment, the Jupyter Notebooks with Python for GPU service must be installed.
  • If you add a software customization using conda or mamba, your environment must have at least 2 GB RAM.
  • After you have started a notebook in an Watson Studio environment, you can't create another conda environment from inside that notebook and use it. Watson Studio environments do not behave like a Conda environment manager.

JupyterLab environments:

  • If you want to add a software customization to an environment template in JupyterLab and want to use the same environment in a job, you must create a custom environment template and select only a Python version as software version. Don't select JupyterLab with a Python version as software version because JuypterLab environment templates are not selectable when you create a job.

GPU environments:

  • The number of GPU runtimes you can have active at any time can't exceed the number of GPU units in your cluster.

Next steps

Learn more

Create a GPU environment template

Parent topic: Environments