Environments (Watson Studio and Watson Knowledge Catalog)

You run operational assets, create jobs, and launch IDEs like RStudio or JupyterLab in a runtime environment. The runtime environment details are specified by environment definitions.

Environment definitions specify the hardware and software configuration of the environment runtimes:

The hardware configuartion specifies the amount of processing power and available RAM.
The software configuration specifies the programming languages, set of pre-installed libraries, and optional libraries or packages that you can specify.

Included environment definitions

You can use the environment definitions that are included in Watson Studio to quickly get started, without having to create your own environment definitions. The included environment definitions are listed on the project's Environments page.

Starting with Cloud Pak for Data 4.0.6, new included environments for notebooks and JupyterLab are added as an affiliate of a runtime release and prefixed with IBM Runtime followed by the release year and release version.

A runtime release specifies a list of key data science libraries and a language version, for example Python 3.9. All environments of a runtime release are built based on the library versions defined in the release, thus ensuring the consistent use of data science libraries across all data science applications.

Included environment definitions for previous refreshes: The Cloud Pak for Data documentation describes the latest 4.0 refresh. If you are not using the latest refresh of Cloud Pak for Data 4.0, you can open a PDF file for the refresh you are using to view the included environment definitions and to read about any migration steps you need to perform if you want to continue using custom environment definitions or custom images. You will find the topics about the environments and creating custom images in the PDF for the Projects section. See Documentation for previous 4.0.x refreshes.

IBM Runtime releases

IBM Runtime release in 2022 are available for Python 3.9 and R 3.6. The runtime release prefix is IBM Runtime 22.1.

While IBM Runtime 22.1 is supported, IBM will update the library versions to address security requirements. Note that these updates will not change the <Major>.<Minor> versions of the libraries, but only the <Patch> versions. This ensures that your notebook assets will continue to run.

For example: IBM Runtime 22.1 supports TensorFlow 2.7. In Cloud Pak for Data 4.0.6, IBM Runtime 22.1 will contain TensorFlow 2.7.0. In Cloud Pak for Data 4.0.7, TensorFlow might be updated to version 2.7.1 or 2.7.2, but not to version 2.8.

Libraries in `IBM Runtime 22.1`

IBM Runtime 22.1 includes the following popular data science library packages for Python 3.9 and R 3.6:

Table 1. Packages and their versions in IBM Runtime 22.1 for Python 3.9 and R 3.6
IBM Runtime 22.1	Library	Version
IBM Runtime 22.1 on Python 3.9
	Dali	1.9
	Horovod	0.23
	Keras	2.7
	Lale	0.6
	LightGBM	3.3
	NumPy	1.20
	ONNX	1.10
	ONNX Runtime	1.10
	OpenCV	4.5
	pandas	1.3
	PyArrow	5.0
	PyTorch	1.10
	scikit-learn	1.0
	SciPy	1.7
	SnapML	1.8
	TensorBoard	2.7
	TensorFlow	2.7
	XGBoost	1.5
IBM Runtime 22.1 on R 3.6
	pandoc	2.12
	python	3.9
	car	3.0
	catools	1.17
	forecast	8.6
	hmisc	4.2
	lme4	1.1
	mvtnorm	1.0
	psych	1.8
	sandwich	2.5
	scikit-learn	1.0
	arrow	5.0
	keras	2.7
	tensorflow	2.7
	xgboost	1.5
	reticulate	1.20
	tidyr	1.1
	caret	6.0
	ggplot2	3.1
	glmnet	2.0
	randomforest	4.6
	spatial	7.3

IBM Runtime 22.1 includes a large set of other useful libraries in addition to the libraries listed in the table. To see the full list, select the IBM Runtime 22.1 on Python 3.9 or the IBM Runtime 22.1 on R 3.6 environment definition on the Environments page of a project, and view the software configuration details.

Getting started

GPU and Execution Engine for Apache Hadoop environments are not available by default:

For Python with GPU environments, the Jupyter Notebooks with Python for GPU service must be installed.
For Execution Engine for Apache Hadoop environments, the Execution Engine for Apache Hadoop service must be installed on the IBM Cloud Pak for Data platform.

After these services are installed, you must create your own environment definitions to use these environments.

Use the following table to find out more about environment definitions by operational asset type.

Operational asset	Programming language	Tool	Environment definition type	Available environment definitions/ compute resources
Jupyter notebook	Python	notebook editor	Anaconda Python distribution	Python environments
	Python	notebook editor	Anaconda Python distribution with GPU	GPU environments
	Python	notebook editor	Spark	Spark environments
	Python	notebook editor	Spark	Hadoop cluster
	R	notebook editor	Anaconda R distribution	R environments
	R	notebook editor	Spark	Spark environments
	Scala	notebook editor	Spark	Spark environments
	Python	JupyterLab	Anaconda Python distributuion	JupyterLab environments
Script	R	RStudio	Anaconda R distribution	RStudio environments
Shiny app	R	RStudio	Anaconda R distribution	RStudio environments
Data Refinery flow	R	Data Refinery	Spark	Data Refinery environments
	R	Data Refinery	Spark	Hadoop cluster

Learn more