Adding Dockerized notebooks

Add a Dockerized notebook, enabling the notebook services to run within a Docker container.

Before you begin

Docker must be installed on a subset of your compute hosts. For a list of supported Docker versions, see Supported Docker versions.
When the Docker daemon is running on a host, the host is considered a Docker active host.
You must be a cluster administrator or have the Notebook Management Configure permission.
You must create the notebook package that contains the scripts and binaries that are required for the notebook to run. See Creating notebook packages.
A suitable Docker image for the instance group must be available.
Note: IBM Spectrum Conductor does not provide Docker images. While you can use default Docker images (such as ubuntu), Docker images must be used at your own risk and must meet the following requirements:
- The Docker image must be compatible with the Docker version that is installed on your hosts to avoid unexpected Docker issues.
- The Docker image must have OpenSSL 1.0.1 or higher installed.
- The Docker image must have the net-tools package.
- If you want to Dockerize a notebook, the Docker image must support the iproute package (that provides the SS utility library). To Dockerize the Zeppelin or Jupyter notebook, the Docker image must also support cURL 7.28.0.
If you provide your own Docker image from a local directory, you must load the Docker image by using the docker load command. Ensure that you load the Docker image to all hosts on which you want the Docker container to run.

You can upload your Docker image as an instance group package. In this case, the Docker image is deployed to Docker hosts when you deploy the instance group, rather than when you start it. You can place the Docker image in a package, which depending on the Docker operation you use, loads or imports the image from a .tar file. Alternatively, you can use a package install script to pull the image from a source. Find Docker images through a Docker registry (for example, the public Docker registry at https://hub.docker.com/).

About this task

You can configure your notebook services to run inside a Docker container, instead of on the host. Running Dockerized notebook services inside a container can simplify the library dependencies for each type of notebook. As a result, you can use Docker's flexibility and portability to run its notebook services on any host, in any environment. For more information on Docker integration and configuration, see Docker integration for Linux.

You can Dockerize the built-in notebook packages when you are adding a new notebook, or you can add and configure your own notebook packages as Dockerized notebooks.

Note: After you add a notebook, you cannot modify its Docker configuration. Therefore, ensure that you Dockerize a notebook during initial configuration before you add it to the cluster.

Procedure

From the cluster management console, click Resources > Frameworks > Notebook Management.
Click Add.
Select the Run notebook in a Docker container check box in the Deployment Settings tab to show the Docker-related fields.

Provide information for the following fields in the Deployment Settings tab:

If a metadata.yml file exists for the notebook, you can automatically fill in all of the required fields and some of the optional fields by dragging and dropping the metadata.yml file and the notebook package into the Add Notebook dialog.
Tip: The metadata.yml file exists for only the out-of-box notebooks in IBM Spectrum Conductor 2.5.0 and later. See Supported Spark, Miniconda, Jupyter, and Dask versions.

You can also manually enter the values as necessary. The Dockerized notebook requires a name, a version, a package, Docker image name, Docker command, and Docker ready check script. All other fields are optional:

Name: Enter a name for the notebook. The name must contain letters and numbers only, and must not exceed 64 characters.
Version: Enter the version of the notebook. The notebook version can contain only numbers and periods (.), and must not exceed 12 characters.
Package: Upload the package that contains the scripts and files that are required for the notebook to run. The package name must be unique within a consumer, must not exceed 1024 characters, and can contain any of the following characters: 0 to 9, a to z, or A to Z, period (.), underscore (_), and hyphen (-).
Enable monitoring for the notebook: Select the check box to enable monitoring for the notebook. Monitoring provides the number of cores that are used, the amount memory used for the notebook (in MB), and the number of executors.
Note: Monitoring works best for notebooks that launch in one browser window. It is not supported for Jupyter notebooks, where each notebook opens in a separate browser window.
Enable collaboration for the notebook: Select the check box to enable collaboration of the notebook service. Collaboration allows multiple notebook collaborators to create, edit, and delete notebook files at the same time as other assigned notebook collaborators, and view the changes made to notebook files by other collaborators.
Supports SSL (optional): Select the check box to indicate that SSL is supported for the notebook, if SSL is configured for the cluster.
Supports user impersonation (optional): Select the check box to indicate that the notebook supports running notebook services and Spark workload as the notebook owner OS user.
Docker image name: Enter the name of Docker image for the container to start with, followed by the Docker tag. If you do not specify a Docker tag, the default value is latest. For example, the Docker image name might be:
- For local images: ubuntu:latest
- For private registry images: hostname:5555/path/ubuntu:latest
- For public registry image: publicserver/path/ubuntu:latest
Docker command: Specify the command to start the Dockerized notebook. The path to the script in the command must be relative to the notebook deployment directory. The Docker script is called when you start a Dockerized notebook service. This script executes inside the Docker container to start the Dockerized notebook process, much like the standard notebook prestart and start commands.
Note: For Jupyter notebooks, you can disable the terminal from the cluster management console by setting the DISABLE_TERMINAL input parameter in the Docker command to true. For example:
```
./scripts/docker_jupyter.sh --disable_terminal true
```
Base port (optional): Specify the base port from which to find an available port for the notebook service instance. For Jupyter notebooks, the default value is 8888.
Memory limit (optional): Enter the memory limit, in megabytes, that the Docker container can occupy. The memory limit value must be a positive integer.

Ready check script: Specify the command to check if the Docker container is ready. The path to the script in the command must be relative to the notebook deployment directory. The ready check script is short running, and is invoked automatically by the Docker controller to provide the notebook URL to the resource orchestrator and to decide whether the Dockerized notebook is started. This script is called periodically, and stops when the notebook service starts successfully or when the notebook service cannot start within the given startup period. For Jupyter notebooks, the default ready check script is ./scripts/readycheck.sh; the default start script is ./scripts/docker_jupyter.sh.

Note: During ready check script creation, you can use the built-in inputs to view the following environment variables that exist within the ready check script:

Environment variable	Input	Description
NOTEBOOK_DATA_DIR	$1	The directory where notebook data is stored. Example: `/var/platformconductor/JupyterPython3-5.4.0/myjup5/129ee4b7-793e-4163-af97-1bee59f83cf6/JupyterPython3-5-0-0-1`
SPARK_EGO_USER	$2	The EGO user that is running the notebook. Example: `Admin`
SPARK_HOME	$3	The directory where Spark is installed. Example: `/var/platformconductor/spark-2.1.1-hadoop-2.7`
NOTEBOOK_DATA_BASE_DIR	$4	The highest directory of the notebook data directory. Example: `/var/platformconductor/JupyterPython3-5.4.0`
`SPARKMS_HOST`	$6	The host that the Spark notebook master runs on. Example: `{hostname_of_Spark_notebook master}`
NOTEBOOK_DEPLOY_DIR	$7	The directory that the notebook is deployed to. Example: `/var/platformconductor/JupyterPython3-5.4.0/`
NOTEBOOK_BASE_PORT	$9	The default base port that the notebook looks for available ports. Example: `9000`
EGO_MASTER_LIST_PEM	$10	The list of EGO primary hosts.
EGO_CONTAINER_ID	$11	The EGO activity ID for the notebook service. Example: `2134`
EGOSC_SERVICE_NAME	$12	The notebook service name. Example: `myjup5-JupyterPython3-5-0-0-1`

Set any environment variables for use by your scripts in the Environment Variables tab. You can use the built-in inputs to view the following environment variables:
Note:
- For Jupyter 5.4.0 notebooks (or later), kernel culling environment variables are configured by default. For more information about kernel culling, see Kernel culling for Jupyter notebooks.
- For the NOTEBOOK_SPARK_PYSPARK_PYTHON and NOTEBOOK_SPARK_R_COMMAND Jupyter notebook environment variables (Environment variables for Jupyter notebooks) with Dockerized notebooks, if the default value for these environment variables are not used and you want to run notebook applications in client mode, manually add a data volume to your notebook configuration to mount the non-default path.
Click Add.

Results

The Dockerized notebook is added to the cluster.

What to do next

Create a instance group and enable the Dockerized notebook that you added.