Notebook environments (Watson Studio)

When you run a notebook in the notebook editor in a project, you choose an environment template, which defines the compute resources for the runtime environment. The environment template specifies the type, size, and power of the hardware configuration, plus the software configuration. For notebooks, environment defintions include a supported language of Python, R, or Scala.

Included environment templates

The following Python environment is included with Watson Studio. The included environment is listed under Templates on the Environments page on the Manage tab of your project.

* Runtime 22.1 on Python 3.9 is deprecated

Table 1. Default environment template for Python
Name Hardware configuration Description
Runtime 22.2 on Python 3.10 1 vCPU and 2 GB RAM
Runtime 22.1 on Python 3.9 1 vCPU and 2 GB RAM * - Available only if you are upgrading from Cloud Pak for Data 4.0.8 or higher
- Available only after a fresh installation of the Jupyter Notebooks with Python 3.9 service

If you have the Runtime 22.2 with R4.2 or the Runtime 22.1 with R 3.6 service installed, the following default R environment is listed.

* Indicates that the environment template is deprecated.

Table 2. Default environment templates for R
Name Hardware configuration
Runtime 22.2 on R 4.2 1 vCPU and 2 GB RAM
Runtime 22.1 on R 3.6 * 1 vCPU and 2 GB RAM

Notebooks and CPU environments

When you open a notebook in edit mode in a CPU runtime environment, exactly one interactive session connects to a Jupyter kernel for the notebook language and the environment runtime that you select. The runtime is started per user and not per notebook. This means that if you open a second notebook with the same environment template in the same project, a second kernel is started in the same runtime. Runtime resources are shared. Runtime resources are also shared if the CPU has GPU.

If you want to avoid sharing runtimes but want to use the same environment template for multiple notebooks in a project, you should create custom environment templates with the same specifications and associate each notebook with its own definition. See Creating environment templates.

If necessary, you can restart or reconnect to the kernel. When you restart a kernel, the kernel is stopped and then started in the same session again, but all execution results are lost. When you reconnect to a kernel after losing a connection, the notebook is connected to the same kernel session, and all previous execution results which were saved are available.

Other environment options for notebooks

You can create notebook environment templates and customize the software configuration. See Creating environment templates.

If you are coding Python notebooks or scripts in the JupyterLab IDE, you can use a JupyterLab environment. See JupyterLab environment templates.

If you have the Execution Engine for Apache Hadoop installed, you can create Hadoop environment templates to run notebooks on your Hadoop cluster. See Hadoop environments.

If you have the Analytics Engine Powered by Apache Spark service installed, you can choose from default Spark environment templates with multiple hardware configurations for Python, R, and Scala. See Spark environments.

If you have the Jupyter Notebooks with Python with GPU service installed, you can create an environment template to run notebooks on GPU clusters. See GPU environments.

File system in Jupyter notebook environments

You must be mindful of the size of the data files you load to your notebook. Very large files might require more storage (disk space) than is available on the node on which the runtime is started.

Be aware that the file system of each runtime is non-persistent and cannot be shared across environments.

Note:

  • Do not confuse storage space and the memory size of your environment. Selecting a larger environment will give you more memory and CPU, but not more storage space.
  • How much storage space is available depends on the amount of storage that was allocated to the node in the OpenShift cluster where Cloud Pak for Data is running. To increase that limit, you would need to change to another OpenShift cluster with more storage space. You can't increase storage space from within Watson Studio.
  • If the size of your data files is large, consider switching to using Spark or Hadoop to process these files. With Spark or Hadoop, the processing workload is spread across multiple nodes.
  • Only the temporary space allocated to the notebook is destroyed when the environment is stopped. Persistent file systems that you referenced in your notebook are not destroyed when the environment is stopped.

Runtime scope

Environment runtimes are always scoped to an environment template and a user within a project.

For example, if you associate each of your notebooks with its own environment, each notebook will get its own runtime. However, if you open a notebook in an environment, which you also selected for another notebook and that notebook has an active runtime, both notebook kernels will be active in the same runtime. In this case, both notebooks will use the compute and data resources available in the runtime that they share.

If you want to avoid sharing runtimes but want to use the same environment template for multiple notebooks in a project, you should create multiple custom environment templates with the same specifications and associate each notebook with its own definition.

Next steps

Parent topic: Environments