Notebook environments (Watson Studio)
When you run a notebook in the notebook editor in a project, you choose an environment template, which defines the compute resources for the runtime environment. The environment definition specifies the type, size, and power of the hardware configuration, plus the software configuration. For notebooks, environment defintions include a supported language of Python, R, or Scala.
- Included environment definitions
- Other environment options for notebooks
- File system in Jupyter notebook environments
- Runtime scope
Included environment definitions
The following Python environment is included with Watson Studio.
| Name | Hardware configuration |
|---|---|
IBM Runtime 22.1 on Python 3.9 |
1 vCPU and 2 GB RAM |
If you have the Jupyter Notebooks with R 3.6 service installed, the following default R environments are listed.
* Indicates that the environment is deprecated.
| Name | Hardware configuration |
|---|---|
IBM Runtime 22.1 on R 3.6 |
1 vCPU and 2 GB RAM |
Deprecated Default R 3.6 * |
1 vCPU and 2 GB RAM |
Notebooks and CPU environments
When you open a notebook in edit mode in a CPU runtime environment, exactly one interactive session connects to a Jupyter kernel for the notebook language and the environment runtime that you select. The runtime is started per user and not per notebook. This means that if you open a second notebook with the same environment definition in the same project, a second kernel is started in the same runtime. Runtime resources are shared. Runtime resources are also shared if the CPU has GPU.
If you want to avoid sharing runtimes but want to use the same environment definition for multiple notebooks in a project, you should create custom environment definitions with the same specifications and associate each notebook with its own definition. See Creating environment definitions.
If necessary, you can restart or reconnect to the kernel. When you restart a kernel, the kernel is stopped and then started in the same session again, but all execution results are lost. When you reconnect to a kernel after losing a connection, the notebook is connected to the same kernel session, and all previous execution results which were saved are available.
Other environment options for notebooks
You can create notebook environment definitions and customize the software configuration. See Creating environment definitions.
If you are coding Python notebooks or scripts in the JupyterLab IDE, you can use a JupyterLab environment. See JupyterLab environment definitions.
If you have the Execution Engine for Apache Hadoop installed, you can create Hadoop environment definitions to run notebooks on your Hadoop cluster. See Hadoop environments.
If you have the Analytics Engine Powered by Apache Spark service installed, you can choose from default Spark environment definitions with multiple hardware configurations for Python, R, and Scala. See Spark environments.
If you have the Jupyter Notebooks with Python with GPU service installed, you can create an environment definition to run notebooks on GPU clusters. See GPU environments.
File system in Jupyter notebook environments
You must be mindful of the size of the data files you load to your notebook. Very large files might require more storage (disk space) than is available on the node on which the runtime is started.
Be aware that the file system of each runtime is non-persistent and cannot be shared across environments.
Note:
- Do not confuse storage space and the memory size of your environment. Selecting a larger environment will give you more memory and CPU, but not more storage space.
- How much storage space is available depends on the amount of storage that was allocated to the node in the OpenShift cluster where Cloud Pak for Data is running. To increase that limit, you would need to change to another OpenShift cluster with more storage space. You can't increase storage space from within Watson Studio.
- If the size of your data files is large, consider switching to using Spark or Hadoop to process these files. With Spark or Hadoop, the processing workload is spread across multiple nodes.
- Only the temporary space allocated to the notebook is destroyed when the environment is stopped. Persistent file systems that you referenced in your notebook are not destroyed when the environment is stopped.
Runtime scope
Environment runtimes are always scoped to an environment definition and a user within a project.
For example, if you associate each of your notebooks with its own environment, each notebook will get its own runtime. However, if you open a notebook in an environment, which you also selected for another notebook and that notebook has an active runtime, both notebook kernels will be active in the same runtime. In this case, both notebooks will use the compute and data resources available in the runtime that they share.
If you want to avoid sharing runtimes but want to use the same environment definition for multiple notebooks in a project, you should create multiple custom environment definitions with the same specifications and associate each notebook with its own definition.
Next steps
- Creating a notebook
- Creating your own environment definition
- Creating Python GPU environments for more compute power
- Customizing an environment definition
- Changing the environment definition of a notebook
- Stopping active notebook runtimes
Parent topic: Environments