Adding notebooks

Add a notebook to register it to IBM® Spectrum Conductor and make it available for selection when you create a new instance group.

Before you begin

  • You must be a cluster administrator or have the Notebook Management Configure permission.
  • You must create the notebook package containing the scripts and binaries required for the notebook to run. See Creating notebook packages.
  • You must install cURL 7.28.0 or higher must be installed on all hosts that run the notebook.
  • If you want to enable Docker support for your notebook, enabling the notebook services to run within Docker containers, you must Dockerize the notebook. See Adding Dockerized notebooks.

About this task

Adding a notebook to IBM Spectrum Conductor registers the notebook to the cluster.

This task is not required for the built-in notebooks that are installed with IBM Spectrum Conductor, unless you removed and want to re-add them, or if you want to add an updated version. The built-in notebooks are typically available when you create a instance group. To add a built-in notebook that you previously deleted or to add an updated version, download the notebook package (and the metadata.yml file if it exists for the notebook) from IBM Fix Central and follow the instructions in the accompanying readme file.

Procedure

  1. From the cluster management console, click Resources > Frameworks > Notebook Management.
  2. Click Add.
  3. Enter the following fields in the Deployment Settings tab. If a metadata.yml file exists for the notebook, you can automatically fill in all of the required fields and some of the optional fields by dragging and dropping the metadata.yml file and the notebook package into the Add Notebook dialog.
    Note: The metadata.yml file exists for only the out-of-box notebooks in IBM Spectrum Conductor 2.5.0 and later. See Supported Spark, Miniconda, Jupyter, and Dask versions.

    You can also manually enter the values as necessary. The notebook requires a name, a version, a package, a start command, a stop command, and a job monitor command. All other fields are optional:

    • Name: Enter a name for the notebook. The notebook name must not exceed 64 characters and can contain any of the following characters: a-z A-Z 0-9
    • Version: Enter the version of the notebook. The notebook version must not exceed 12 characters, must start with a number, and can contain any of the following characters: 0-9 . (cannot contain only periods).
    • Package: Upload the package that contains the scripts and files that are required for the notebook to run. The package name must be unique within a consumer, must not exceed 1024 characters, and can contain any of the following characters: 0-9 A-Z a-z . _ -
    • Run notebook in a Docker container (optional): Select the check box to configure your notebook services to run inside a Docker container, instead of on the host. Running Dockerized notebook services inside a container can simplify the library dependencies for each type of notebook. As a result, you can use Docker’s flexibility and portability to run its notebook services on any host, in any environment. If you select this option, the configuration settings for the notebook change. To add a Dockerized notebook, see Adding Dockerized notebooks.
    • Enable monitoring for the notebook (optional): Select the check box to enable monitoring for the notebook. Monitoring provides the number of cores that are used, the amount of memory used for the notebook in MB, and the number of executors.
      Note: Monitoring works best for notebooks that launch in one browser window. It is not supported for Jupyter notebooks, where each notebook opens in a separate browser window.
    • Enable collaboration for the notebook (optional): Select the check box to enable collaboration of the notebook service. Collaboration allows multiple notebook collaborators to create, edit, and delete notebook files at the same time as other assigned notebook collaborators, and view the changes made to notebook files by other collaborators. For more information, see Notebook collaboration.
    • Supports SSL (optional): Select the check box to indicate that SSL is supported for the notebook, if SSL is configured for the cluster.
    • Supports user impersonation (optional): Select the check box to indicate that the notebook supports running notebook services and their Spark workload as the notebook owner OS user.

      For notebooks enabled with Kerberos user authentication and user impersonation, you must also specify your principal and the location of your keytab file, which you can do through environment variables for your notebook. Refer to Adding environment variables to notebooks for details on how to configure this.

    • Anaconda required (optional): Select the check box to make an Anaconda or Miniconda distribution and environment mandatory for the notebook. Instance groups that use this notebook must specify an Anaconda or Miniconda distribution and environment.
    • Prestart command (optional): Specify the command to prestart the notebook. The path to the script in the command must be relative to the notebook deployment directory. The prestart script is called when you start the notebook service to perform preconfiguration of the notebook running environment. These configurations include retrieving the available port on the host for the notebook web service to start.
    • Start command: Specify the command to start the notebook. The path to the script in the command must be relative to the notebook deployment directory.
    • Stop command: Specify the command to stop the notebook. The path to the script in the command must be relative to the notebook deployment directory. The stop script is called when you perform a service stop to stop the notebook server.
    • Job Monitor command: Specify the command to monitor the notebook activity. The path to the script in the command must be relative to the notebook deployment directory. The job monitor script is started automatically by the resource orchestrator when you start the notebook service. This script monitors the notebook process and retrieves the notebook web service port.
    • Base port (optional): Specify the base port from which to find an available port for the notebook service instance. For Zeppelin notebooks, the default value is 8380. For Jupyter notebooks, the default value is 8888.
    • Longest update interval for job monitor (optional): Enter the interval (in seconds) within which the resource orchestrator expects to receive the activity state from the job monitor.
    • Job control wait period (optional): Enter the interval (in seconds) that the resource orchestrator waits before it ends an activity.
  4. Set any environment variables for use by your scripts in the Environment Variables tab.
    Note:
    • For Jupyter 5.4.0 notebooks (or later), kernel culling environment variables are configured by default. For more information about kernel culling, see Kernel culling for Jupyter notebooks.
    • To deploy the notebook only with specific Spark versions, define the supported_spark_versions environment variable to a comma-separated list of Spark versions. For example, to deploy a notebook with Spark versions 2.1.0 and 1.5.2, specify supported_spark_versions as 2.1.0,1.5.2.
  5. Click Add.

Results

The notebook is added to the cluster.

What to do next

Create a instance group and enable the notebook that you added. See Creating instance groups.

To enable an existing instance group to use this updated notebook package, see Updating instance groups to use updated components (Spark version, notebook packages, or Dask version).