Planning your notebooks and scripts experience
To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.
You can perform most tasks that are related to notebooks and scripts with Editor or Admin role in an analytics project.
Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:
- Which programming language do you want to work in?
- What will your notebooks be doing?
- What libraries do you want to work with?
- How can you use the notebook or script in Cloud Pak for Data as a Service?
To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.
Task | Mandatory? | Timing |
---|---|---|
Creating a project | Yes | This must be your very first task |
Adding data assets to the project | Yes | Before you begin creating notebooks |
Picking a programming language | Yes | Before you select the tool |
Selecting a tool | Yes | After selecting the language |
Checking the library packages | Yes | Before you select a runtime environment |
Choosing an appropriate runtime environment | Yes | Before you open the development environment |
Managing the notebooks and scripts lifecycle | No | When the notebook or script is ready |
Uses for notebooks and scripts after creation | No | When the notebook is ready |
Creating a project
You need to create a project before you can start working in notebooks.
Projects You can create an empty project, one from file, or from URL. In this project:
- You can use the Jupyter Notebook and RStudio.
- Notebooks are assets in the project.
- Notebook collaboration is based on locking by user at the project level.
- R scripts and Shiny apps are not assets in the project.
- There is no collaboration on R scripts or Shiny apps.
Picking a programming language
You can choose to work in the following languages:
- Notebooks
- Python and R
- Scripts
- R scripts and R Shiny apps
Selecting a tool
In Cloud Pak for Data as a Service, you can work with notebook and scripts in the following tool:
- Jupyter Notebook editor
- In the Jupyter Notebook editor, you can create Python or R notebooks. Notebooks are assets in a project. Collaboration is only at the project level. The notebook is locked by a user when opened and can only be unlocked by the same user or a project admin.
- RStudio
- In RStudio, you can create R scripts and Shiny apps. R scripts are not assets in a project, which means that there is no collaboration at the project level.
Checking the library packages
When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:
- The Watson Natural Language Processing library in Python environments
- Libraries to help you access project assets
- Libraries for time series or geo-spatial analysis in Spark environments
For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.
If libraries are missing in a template, you can add them:
- Through the notebook or script
- You can use familiar package install commands for your environment. For example, in Python notebooks, you can use
mamba
,conda
orpip
. - By creating a custom environment template
- When you create a custom template, you can create a software customization and add the libraries that you want to include. For details, see Customizing environment templates.
Choosing a runtime environment
Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.
watsonx.ai Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.
If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.
Working with data
To work with data in a notebook:
- Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
- Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support.
- Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.
Managing the notebooks and scripts lifecycle
After you create and test a notebook in your tool, you can:
- Publish it to a catalog so that other catalog members can use the notebook in their projects. See Publishing assets from a project into a catalog.
- Share a read-only copy outside of watsonx.ai Studio so that people who aren't collaborators in your projects can see and use it. See Sharing notebooks with a URL.
- Publish it to a GitHub repository. See Publishing notebooks on GitHub.
- Publish it as a gist. See Publishing a notebook as a gist.
R scripts and Shiny apps can't be published or shared using functionality in a project.
Uses for notebooks and scripts after creation
The options for a notebook that is created and ready to use in Cloud Pak for Data as a Service include:
-
Running it as a job in a project. See Creating and managing jobs in a project.
-
Running it as part of a Pipelines. See Configuring pipeline nodes.
To ensure that a notebook can be run as a job or in a pipeline:
- Ensure that no cells require interactive input by a user.
- Ensure that the notebook logs enough detailed information to enable understanding the progress and any failures by looking at the log.
- Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
-
Using the watsonx.ai Runtime Python client to build, train and then deploy your models. See watsonx.ai Runtime Python client samples and examples.
-
Using the watsonx.ai Runtime REST API to build, train and then deploy your models.
R scripts and Shiny apps can only be created and used in the RStudio IDE in Cloud Pak for Data as a Service. You can't create jobs for R scripts or R Shiny deployments.
Parent topic: Notebooks and scripts