Planning your notebooks and scripts experience

To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.

You can perform most tasks that are related to notebooks and scripts with Editor or Admin role in an analytics project.

Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:

  • Which programming language do you want to work in?
  • What will your notebooks be doing?
  • What libraries do you want to work with?
  • How can you use the notebook or script in Cloud Pak for Data as a Service?

To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.

Task Mandatory? Timing
Creating a project Yes This must be your very first task
Adding data assets to the project Yes Before you begin creating notebooks
Picking a programming language Yes Before you select the tool
Selecting a tool Yes After selecting the language
Checking the library packages Yes Before you select a runtime environment
Choosing an appropriate runtime environment Yes Before you open the development environment
Managing the notebooks and scripts lifecycle No When the notebook or script is ready
Uses for notebooks and scripts after creation No When the notebook is ready

Creating a project

You need to create a project before you can start working in notebooks.

Projects You can create an empty project, one from file, or from URL. In this project:

  • You can use the Jupyter Notebook and RStudio.
  • Notebooks are assets in the project.
  • Notebook collaboration is based on locking by user at the project level.
  • R scripts and Shiny apps are not assets in the project.
  • There is no collaboration on R scripts or Shiny apps.

Picking a programming language

You can choose to work in the following languages:

Notebooks
Python and R
Scripts
R scripts and R Shiny apps

Selecting a tool

In Cloud Pak for Data as a Service, you can work with notebook and scripts in the following tool:

Jupyter Notebook editor
In the Jupyter Notebook editor, you can create Python or R notebooks. Notebooks are assets in a project. Collaboration is only at the project level. The notebook is locked by a user when opened and can only be unlocked by the same user or a project admin.
RStudio
In RStudio, you can create R scripts and Shiny apps. R scripts are not assets in a project, which means that there is no collaboration at the project level.

Checking the library packages

When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:

  • The Watson Natural Language Processing library in Python environments
  • Libraries to help you access project assets
  • Libraries for time series or geo-spatial analysis in Spark environments

For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.

If libraries are missing in a template, you can add them:

Through the notebook or script
You can use familiar package install commands for your environment. For example, in Python notebooks, you can use mamba, conda or pip.
By creating a custom environment template
When you create a custom template, you can create a software customization and add the libraries that you want to include. For details, see Customizing environment templates.

Choosing a runtime environment

Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.

watsonx.ai Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.

If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.

Important: Make sure that the environment has enough memory to store the data that you load to the notebook. Oftentimes this means that the environment must have significantly more memory than the total size of the data loaded to the notebook because some data frameworks, like pandas, can hold multiple copies of the data in memory.

Working with data

To work with data in a notebook:

  • Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
  • Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support.
  • Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.

Managing the notebooks and scripts lifecycle

After you create and test a notebook in your tool, you can:

R scripts and Shiny apps can't be published or shared using functionality in a project.

Uses for notebooks and scripts after creation

The options for a notebook that is created and ready to use in Cloud Pak for Data as a Service include:

  • Running it as a job in a project. See Creating and managing jobs in a project.

  • Running it as part of a Pipelines. See Configuring pipeline nodes.

    To ensure that a notebook can be run as a job or in a pipeline:

    • Ensure that no cells require interactive input by a user.
    • Ensure that the notebook logs enough detailed information to enable understanding the progress and any failures by looking at the log.
    • Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
  • Using the watsonx.ai Runtime Python client to build, train and then deploy your models. See watsonx.ai Runtime Python client samples and examples.

  • Using the watsonx.ai Runtime REST API to build, train and then deploy your models.

R scripts and Shiny apps can only be created and used in the RStudio IDE in Cloud Pak for Data as a Service. You can't create jobs for R scripts or R Shiny deployments.

Parent topic: Notebooks and scripts