Planning your notebooks and scripts experience
To make a plan for using Jupyter notebooks and scripts, first understand the choices that you have, the implications of those choices, and how those choices affect the order of implementation tasks.
You can perform most tasks related to notebooks and scripts with Editor or Admin role in an analytics project. Only if you need to install services or create custom runtime images, you will need IBM Cloud Pak for Data Administrator user role.
Before you start working with notebooks and scripts, consider the following questions as most tasks need to be completed in a particular order:
- Which programming language do you want to work in?
- Which tool is your preferred development environment tool?
- Do you want to collaborate with others through Git?
- What will your notebooks be doing?
- What libraries do you want to work with?
- Do you want to work in the product UI, automate the entire proces, or use a mixture of both methods?
- How can you use the notebook or script in Cloud Pak for Data?
To create a plan for using Jupyter notebooks or scripts, determine which of the following tasks you must complete.
Task | Mandatory? | Timing |
---|---|---|
Selecting the project type | Yes | This must be your very first task |
Adding data assets to the project | Yes | Before you begin creating notebooks |
Picking a programming language | Yes | Before you select the tool |
Selecting a tool | Yes | After you've picked the language |
Checking the library packages | Yes | Before you select a runtime environment |
Choosing an appropriate runtime environment | Yes | Before you open the development environment |
Automating the lifecycle of a notebook or script | No | You can automate the entire lifecycle or parts of it |
Managing the notebooks and scripts lifecycle | No | When the notebook is ready |
Uses for notebooks and scripts after creation | No | When the notebook is ready |
Selecting the project type
The type of project you create affects the way collaboration works and the tools you can use.
Projects without Git integration You can create an empty project or create one from file. In projects without Git integration:
- You can use the Jupyter Notebook editor and RStudio.
- Notebooks are run as standalone files, with no direct access to any other notebook or script in the project
- Notebook collaboration is based on locking by user.
- You can run R scripts and Shiny apps interactively in RStudio.
- There is no collaboration on R scripts or Shiny apps.
- Notebooks can be shared with others by publishing them to a catalog, to a GitHub repository, as a Gist, or by sharing the URL.
- You can't use JupyterLab or the Visual Studio Code editor.
See Creating a project.
Git-integrated projects You can create a project that is associated with a Git repository. In git-integrated projects:
- You can use JupyterLab, RStudio, and the Visual Studio Code editor.
- Collaboration is available across all files in all branches of the Git repository that is associated with the project.
- When you run notebooks and scripts, you can directly refer to any other notebook or script in the project.
- Notebooks and scripts can't be published to a catalog.
- You can't use the Jupyter Notebook editor.
See Accessing a Git repository for creating a project with Git integration.
Picking a programming language
You can choose to work in the following languages:
- Python
- Python is always included when you install Watson Studio.
- R
- R is not available by default. An administrator must install an R Notebook runtime or the RStudio Server Runtimes service on the IBM Cloud Pak for Data platform. To determine whether an R runtime or RStudio Server Runtimes is installed, open the Services catalog and check whether the service is enabled.
Selecting a tool
You can work with notebooks and scripts in the following tools. Your tool choice is influenced by the programming language and the development environment that you want to work in, which determines the type of project you need to create.
Tool | Programming language | Project Type | Collaboration | Why pick this tool? |
---|---|---|---|---|
Jupyter Notebook editor | Python or R | Project without Git integration | Collaboration only at the project level. The notebook is locked by a user and can only be unlocked by the same user or a project admin. | Matter of preference: The Jupyter Notebook editor feels more standalone, as new notebooks are opened in new tabs in the project. The editor is easy to use as it just consists of a file browser and an editor view. |
RStudio | R | Project with or without Git integration | Without Git integration, no collaboration. With Git integration, collaboration across all files. | Ideal environment for writing R scripts, navigating the files on our computer, visualizing your results, and supporting version control, developing packages, and writing Shiny apps. |
JupyterLab | Python | Git-integrated project | Git-based collaboration across files in the associated repository. | JupyterLab is an IDE with a modular structure, where you can open several notebooks or scripts as tabs in the same window. JupyterLab supports useful extensions like Git and Elyra. |
Visual Studio Code editor | Python | Git-integrated project | Git-based collaboration across all files in the repository. You create the notebook or script in the Visual Studio (VS) Code editor on your workstation and then run and debug the code in a Watson Studio runtime directly from the VS Code editor. | VS Code offers a huge ecosystem with more than 30,000 extensions, for example to help you analyze your code, find vulnerabilities, detect bad code patterns, enforce code style guides, and provide code suggestions (AI assisted programming). |
Checking the library packages
When you open a notebook in a runtime environment, you have access to a large selection of preinstalled data science library packages. Many environments also include libraries provided by IBM at no extra charge, such as:
- The Watson Natural Language Processing library in Python environments
- Libraries to help you access project assets
- Libraries for time series or geo-spatial analysis in Spark environments
For a list of the library packages and the versions included in an environment template, select the template on the Templates page from the Manage tab on the project's Environments page.
If libraries are missing in a template, you can add them:
- Through the notebook or script
- You can use familiar package install commands for your environment. For example, in Python notebooks, you can use
mamba
,conda
orpip
. - By creating a custom environment template
- When you create a custom template, you can either add a software customization with your libraries, or a custom runtime image that you build with the libraries you want to include. For details, see Customizing environment templates.
Choosing a runtime environment
Choosing the compute environment for your notebook depends on the amount of data you want to process and the complexity of the data analysis processes.
Watson Studio offers many default environment templates with different hardware sizes and software configurations to help you quickly get started, without having to create your own templates. These included templates are listed on the Templates page from the Manage tab on the project's Environments page. For more information about the included environments, see Environments.
If the available templates don't suit your needs, you can create custom templates and determine the hardware size and software configuration. For details, see Customizing environment templates.
Working with data
To work with data in a notebook, you need to:
- Add the data to your project, which turns the data into a project asset. See Adding data to a project for the different methods for adding data to a project.
- Use generated code that loads data from the asset to a data structure in your notebook. For a list of the supported data types, see Data load support
- Write your own code to load data if the data source isn't added as a project asset or support for adding generated code isn't available for the project asset.
The following notebook tools support generating code to load data to a data structure:
Tool | Generated code supported? | When to write code |
---|---|---|
Jupyter Notebook editor | Yes | - Generating code for the file type or database connection isn't supported. - The file or database connection isn't a project asset. |
JupyterLab | Yes | - Generating code for the file type or database connection isn't supported. - The file or database connection isn't a project asset. |
Visual Studio Code editor | No | At all times. You can copy the generated code added to a notebook cell from the Jupyter Notebook editor or JupyterLab to a notebook in the Visual Studio Code editor. |
Automating the lifecycle of a notebook and script
You can use CPDCTL, a command-line interface, to manage the lifecycle of a notebook or script in Cloud Pak for Data. You can automate the entire flow, or only parts of the flow. For details, see Automating the lifecycle of notebooks and scripts.
Managing the notebooks and scripts lifecycle
After you have created and tested your notebooks or scripts in your tool in a project, you can:
- Move notebooks and scripts into a deployment space.
- [Notebooks only] Publish the notebook to a catalog so that other catalog members can use it in their projects. See Publishing assets from a project into a catalog.
- [Notebooks only] Share a read-only copy outside of Watson Studio so that people who aren't collaborators in your projects can see and use it. See Sharing notebooks with a URL.
Uses for notebooks and scripts after creation
The options for a notebook or a script that is created and ready to use in IBM Cloud Pak for Data include:
- [For notebooks and scripts] Running it as a job in a project (platform job). See Creating and managing jobs in a project.
- [For notebooks and scripts] Running it as a job in a deployment space. This does not require Watson Machine Learning to be installed. See Creating jobs in deployment spaces.
- [For scripts only] Running it as a batch deployment with Watson Machine Learning in a space. See Creating a batch deployment job. Notebooks cannot be run as batch deployments.
- [For notebooks and scripts] Running it as part of a Watson Pipeline. See Configuring pipeline nodes.
To ensure that a notebook or script can be run as a job or in a pipeline (notebooks only):
- Ensure that no cells require interactive input by a user.
- Ensure that enough detailed information is logged to enable understanding the progress and any failures by looking at the log.
- Use environment variables in the code to access configurations if a notebook or script requires them, for example the input data file or the number of training runs.
- If you're loading data from data sources as part of your code, make sure to properly handle error cases such as network connection or timeout errors.
The following table shows the differences between running Python or R scripts as platform jobs or as batch deployments in a deployment space.
Job | Which variables can I pass? | Are mounted storage volumes supported? | Required compute support |
---|---|---|---|
Platform jobs | Can pass environment variables and command-line type arguments | Yes | Environment runtimes without Watson Machine Learning |
Batch deployment jobs | Can only pass parameters that match a predefined pattern | No | Software specifications in Watson Machine Learning |
Parent topic: Notebooks and scripts