Planning your notebooks and scripts experience

To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.

You can perform most tasks that are related to notebooks and scripts with Editor or Admin role in a project.

Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:

  • Which programming language do you want to work in?
  • Which tool is your preferred development environment tool?
  • Do you want to collaborate with others through Git?
  • What will your notebooks be doing?
  • What libraries do you want to work with?
  • Do you want to work in the product UI, automate the entire proces, or use a mixture of both methods?
  • How can you use the notebook or script?

To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.

Tasks to complete when starting to use Jupyter notebooks
Task Mandatory? Timing
Adding data assets to the project Yes Before you begin creating notebooks
Picking a programming language Yes Before you select the tool
Checking the library packages Yes Before you select a runtime environment
Choosing an appropriate runtime environment Yes Before you open the development environment
Automating the lifecycle of a notebook or script No You can automate the entire lifecycle or parts of it
Managing the notebooks and scripts lifecycle No When the notebook is ready
Uses for notebooks and scripts after creation No When the notebook is ready

Picking a programming language

You can choose to work in the following languages:

  • Python
  • R

Python is always included when you install watsonx.ai Studio.

R is not available by default. An administrator must install R-based notebook runtimes (for Jupyter Notebooks) or the RStudio Server Runtimes service (to make RStudio and RStudio runtimes available). To determine whether the RStudio Server Runtimes service is installed, open the Services catalog. If the service is installed and ready to use, the tile in the catalog shows Ready to use. To check what specific R notebook runtimes are installed, from your Watson Studio project, open the Manage tab, select Environments and then click on Templates.

Selecting a tool

You can work with notebooks and scripts in the notebook editor.

Checking the library packages

When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:

  • The Watson Natural Language Processing library in Python environments
  • Libraries to help you access project assets
  • Libraries for time series or geo-spatial analysis in Spark environments

For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.

If libraries are missing in a template, you can add them:

Through the notebook or script
You can use familiar package install commands for your environment. For example, in Python notebooks, you can use mamba, conda or pip.
By creating a custom environment template
When you create a custom template, you can either add a software customization with your libraries, or a custom runtime image that you build with the libraries you want to include. For details, see Customizing environment templates.

Choosing a runtime environment

Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.

watsonx.ai Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.

If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.

Important: Make sure that the environment has enough memory to store the data that you load to the notebook. Oftentimes this means that the environment must have significantly more memory than the total size of the data loaded to the notebook because some data frameworks, like pandas, can hold multiple copies of the data in memory.

Working with data

To work with data in a notebook:

  • Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
  • Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support
  • Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.

Automating the lifecycle of a notebook and script

You can use CPDCTL, a command-line interface, to manage the lifecycle of a notebook or script. You can automate the entire flow, or only parts of the flow. For details, see Automating the lifecycle of notebooks and scripts.

Managing the notebooks and scripts lifecycle

After you have created and tested your notebooks or scripts in your tool in a project, you can share a read-only copy outside of watsonx.ai Studio so that people who aren't collaborators in your projects can see and use it. See Sharing notebooks with a URL.

Uses for notebooks and scripts after creation

The options for a notebook or a script that is created and ready to use in IBM Cloud Pak for Data include:

To ensure that a notebook or script can be run as a job or in a pipeline (notebooks only):

  • Ensure that no cells require interactive input by a user.
  • Ensure that enough detailed information is logged to enable understanding the progress and any failures by looking at the log.
  • Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
  • If you're loading data from data sources as part of your code, make sure to properly handle error cases such as network connection or timeout errors.