Environments (Watson Studio and Watson Knowledge Catalog)
You run operational assets, create jobs, and launch IDEs like RStudio or JupyterLab in a runtime environment. The runtime environment details are specified by environment definitions.
Environment definitions specify the hardware and software configuration of the environment runtimes:
- The hardware configuartion specifies the amount of processing power and available RAM.
- The software configuration specifies the programming languages, set of pre-installed libraries, and optional libraries or packages that you can specify.
Included environment definitions
You can use the environment definitions that are included in Watson Studio to quickly get started, without having to create your own environment definitions. The included environment definitions are listed on the project's Environments page.
Starting with Cloud Pak for Data 4.0.6, new included environments for notebooks and JupyterLab are added as an affiliate of a runtime release and prefixed with IBM Runtime followed by the release year and release version.
A runtime release specifies a list of key data science libraries and a language version, for example Python 3.9. All environments of a runtime release are built based on the library versions defined in the release, thus ensuring the consistent use of data science libraries across all data science applications.
Included environment definitions for previous refreshes: The Cloud Pak for Data documentation describes the latest 4.0 refresh. If you are not using the latest refresh of Cloud Pak for Data 4.0, you can open a PDF file for the refresh you are using to view the included environment definitions and to read about any migration steps you need to perform if you want to continue using custom environment definitions or custom images. You will find the topics about the environments and creating custom images in the PDF for the Projects section. See Documentation for previous 4.0.x refreshes.
IBM Runtime releases
IBM Runtime release in 2022 are available for Python 3.9 and R 3.6. The runtime release prefix is IBM Runtime 22.1.
While IBM Runtime 22.1 is supported, IBM will update the library versions to address security requirements. Note that these updates will not change the <Major>.<Minor> versions of the libraries, but only the
<Patch> versions. This ensures that your notebook assets will continue to run.
For example: IBM Runtime 22.1 supports TensorFlow 2.7. In Cloud Pak for Data 4.0.6, IBM Runtime 22.1 will contain TensorFlow 2.7.0. In Cloud Pak for Data 4.0.7, TensorFlow might be updated to version 2.7.1 or 2.7.2, but
not to version 2.8.
Libraries in IBM Runtime 22.1
IBM Runtime 22.1 includes the following popular data science library packages for Python 3.9 and R 3.6:
| IBM Runtime 22.1 | Library | Version |
|---|---|---|
| IBM Runtime 22.1 on Python 3.9 | ||
| Dali | 1.9 | |
| Horovod | 0.23 | |
| Keras | 2.7 | |
| Lale | 0.6 | |
| LightGBM | 3.3 | |
| NumPy | 1.20 | |
| ONNX | 1.10 | |
| ONNX Runtime | 1.10 | |
| OpenCV | 4.5 | |
| pandas | 1.3 | |
| PyArrow | 5.0 | |
| PyTorch | 1.10 | |
| scikit-learn | 1.0 | |
| SciPy | 1.7 | |
| SnapML | 1.8 | |
| TensorBoard | 2.7 | |
| TensorFlow | 2.7 | |
| XGBoost | 1.5 | |
| IBM Runtime 22.1 on R 3.6 | ||
| pandoc | 2.12 | |
| python | 3.9 | |
| car | 3.0 | |
| catools | 1.17 | |
| forecast | 8.6 | |
| hmisc | 4.2 | |
| lme4 | 1.1 | |
| mvtnorm | 1.0 | |
| psych | 1.8 | |
| sandwich | 2.5 | |
| scikit-learn | 1.0 | |
| arrow | 5.0 | |
| keras | 2.7 | |
| tensorflow | 2.7 | |
| xgboost | 1.5 | |
| reticulate | 1.20 | |
| tidyr | 1.1 | |
| caret | 6.0 | |
| ggplot2 | 3.1 | |
| glmnet | 2.0 | |
| randomforest | 4.6 | |
| spatial | 7.3 |
IBM Runtime 22.1 includes a large set of other useful libraries in addition to the libraries listed in the table. To see the full list, select the IBM Runtime 22.1 on Python 3.9 or the IBM Runtime 22.1 on R 3.6 environment definition on the Environments page of a project, and view the software configuration details.
Getting started
GPU and Execution Engine for Apache Hadoop environments are not available by default:
- For Python with GPU environments, the Jupyter Notebooks with Python for GPU service must be installed.
- For Execution Engine for Apache Hadoop environments, the Execution Engine for Apache Hadoop service must be installed on the IBM Cloud Pak for Data platform.
After these services are installed, you must create your own environment definitions to use these environments.
Use the following table to find out more about environment definitions by operational asset type.
| Operational asset | Programming language | Tool | Environment definition type | Available environment definitions/ compute resources |
|---|---|---|---|---|
| Jupyter notebook | Python | notebook editor | Anaconda Python distribution | Python environments |
| Python | notebook editor | Anaconda Python distribution with GPU | GPU environments | |
| Python | notebook editor | Spark | Spark environments | |
| Python | notebook editor | Spark | Hadoop cluster | |
| R | notebook editor | Anaconda R distribution | R environments | |
| R | notebook editor | Spark | Spark environments | |
| Scala | notebook editor | Spark | Spark environments | |
| Python | JupyterLab | Anaconda Python distributuion | JupyterLab environments | |
| Script | R | RStudio | Anaconda R distribution | RStudio environments |
| Shiny app | R | RStudio | Anaconda R distribution | RStudio environments |
| Data Refinery flow | R | Data Refinery | Spark | Data Refinery environments |
| R | Data Refinery | Spark | Hadoop cluster |
Learn more
- Environment definitions for the notebook editor
- Environment definitions for JupyterLab
- Spark environment definitions
- GPU environment definitions
- Environment definitions for RStudio
- Environment definitions for Data Refinery
- Refinery data on the Hadoop cluster
- Creating environment definitions
- Customizing environment definitions
- Stopping active runtimes when no longer needed
Parent topic: Projects