Planning your notebooks and scripts experience
To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.
You can perform most tasks that are related to notebooks and scripts with Editor or Admin role in a project.
Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:
- Which programming language do you want to work in?
- Which tool is your preferred development environment tool?
- Do you want to collaborate with others through Git?
- What will your notebooks be doing?
- What libraries do you want to work with?
- Do you want to work in the product UI, automate the entire proces, or use a mixture of both methods?
- How can you use the notebook or script?
To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.
| Task | Mandatory? | Timing |
|---|---|---|
| Adding data assets to the project | Yes | Before you begin creating notebooks |
| Picking a programming language | Yes | Before you select the tool |
| Checking the library packages | Yes | Before you select a runtime environment |
| Choosing an appropriate runtime environment | Yes | Before you open the development environment |
| Automating the lifecycle of a notebook or script | No | You can automate the entire lifecycle or parts of it |
| Managing the notebooks and scripts lifecycle | No | When the notebook is ready |
| Uses for notebooks and scripts after creation | No | When the notebook is ready |
Picking a programming language
You can choose to work in the following languages:
- Python
- R
Python is always included when you install watsonx.ai Studio.
R is not available by default. An administrator must install R-based notebook runtimes (for Jupyter Notebooks) or the RStudio Server Runtimes service (to make RStudio and RStudio runtimes available). To determine whether the RStudio Server Runtimes
service is installed, open the Services catalog. If the service is installed and ready to use, the tile in the catalog shows Ready to use. To check what specific R notebook runtimes are installed, from your Watson
Studio project, open the Manage tab, select Environments and then click on Templates.
Selecting a tool
You can work with notebooks and scripts in the notebook editor.
Checking the library packages
When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:
- The Watson Natural Language Processing library in Python environments
- Libraries to help you access project assets
- Libraries for time series or geo-spatial analysis in Spark environments
For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.
If libraries are missing in a template, you can add them:
- Through the notebook or script
- You can use familiar package install commands for your environment. For example, in Python notebooks, you can use
mamba,condaorpip. - By creating a custom environment template
- When you create a custom template, you can either add a software customization with your libraries, or a custom runtime image that you build with the libraries you want to include. For details, see Customizing environment templates.
Choosing a runtime environment
Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.
watsonx.ai Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.
If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.
Working with data
To work with data in a notebook:
- Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
- Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support
- Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.
Automating the lifecycle of a notebook and script
You can use CPDCTL, a command-line interface, to manage the lifecycle of a notebook or script. You can automate the entire flow, or only parts of the flow. For details, see Automating the lifecycle of notebooks and scripts.
Managing the notebooks and scripts lifecycle
After you have created and tested your notebooks or scripts in your tool in a project, you can share a read-only copy outside of watsonx.ai Studio so that people who aren't collaborators in your projects can see and use it. See Sharing notebooks with a URL.
Uses for notebooks and scripts after creation
The options for a notebook or a script that is created and ready to use in IBM Cloud Pak for Data include:
- [For notebooks and scripts] Running it as a job in a project (platform job). See Creating and managing jobs in a project.
- [For notebooks and scripts] Running it as part of a Pipelines. See Configuring pipeline nodes.
To ensure that a notebook or script can be run as a job or in a pipeline (notebooks only):
- Ensure that no cells require interactive input by a user.
- Ensure that enough detailed information is logged to enable understanding the progress and any failures by looking at the log.
- Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
- If you're loading data from data sources as part of your code, make sure to properly handle error cases such as network connection or timeout errors.