Enabling notebooks for an instance group

Optionally, enable notebooks for the instance group. By default, an instance group does not include notebooks.

Before you begin

Based on your requirements, ensure that you meet the requirements to create an instance group. See Prerequisites for an instance group.

About this task

When you enable a notebook for an instance group, you can use the notebook's graphical user interface to manipulate and visualize data. IBM® Spectrum Conductor is installed with the built-in Jupyter notebook.

Procedure

  1. In the Basic Settings tab, select the notebook to enable for the instance group.
  2. Enter the base data directory to store notebook data. If a value is not specified, the notebook's deployment directory is used as the base data directory. By default, the notebook is deployed to {DEPLOY_DIR_OF_SIG}/{NOTEBOOK_NAME}-{NOTEBOOK_VERSION}, where SIG specifies the name of the instance group.

    For production environments, ensure that you change the default base data directory to a shared directory. Without a shared directory, notebooks cannot access data if the notebook was previously started on a different host.

    Note: The instance group environment is periodically cleaned up by the SparkCleanup service. If you specify a non-default location for the notebook base data directory, you must manually clean up that directory when the instance group is removed.
  3. Specify an Anaconda distribution instance instance. Optionally, if available, use a linked conda environment.
    Note: When using an existing Anaconda distribution instance in an instance group, the notebook execution user must have read/write access to the Anaconda distribution instance's deployment directory. When using an existing conda environment in an instance group, the notebook execution user must have read/write access to the environment directory, located in the $DEPLOY_HOME/anaconda/envs/environment_name directory (where $DEPLOY_HOME is the deployment directory for the Anaconda distribution instance).
  4. Optional: Complete these steps to customize the default notebook configuration:
    1. Click the Configuration link for the notebook.
    2. Define the notebook configuration:
      Field Description
      Execution user Specify the execution user for the notebook. By default, the execution user specified for the instance group is used. Consider the following requirements when you specify an execution user:
      • If you specify different execution users for the notebook and the instance group, ensure that both users belong to the same primary OS user group.
      • The instance group execution user should be the same as the Anaconda execution user. Otherwise, you must ensure that the instance group execution user has write permission to the Anaconda deployment directory used for deploying notebooks.
      • If the SPARK_EGO_IMPERSONATION parameter is set to false, the notebook execution user must be the same as the executor consumer execution user or you will receive a permission issue.
      Administrator user group Specify the administrator user group for the notebook. The administrator user group is assigned permission to all the directories and files within the notebook deployment directory tree.

      The specified administrator user group must satisfy that the execution user of the notebook must be a member of the specified administrator user group. If not satisfied, or if the provided administrator user group is not valid, the deployment will fail.

      If you do not provide the administrator user group here, the system assigns the primary user group of the notebook execution OS user to all the directories and files within the notebook deployment directory tree.

      Deployment directory Specify the directory to which the notebook is deployed. By default, the deployment directory is created at {DEPLOY_DIR_OF_IG}/{NOTEBOOK_NAME}-{NOTEBOOK_VERSION}, where IG specifies the name of the instance group.

      Ensure that the notebook deployment directory has sufficient disk space. For Zeppelin notebooks, the recommended disk space is 2 GB.

      For Zeppelin notebooks, the path to the deployment directory can only contain letters (uppercase and lowercase) and numbers. Do not use special characters in the path; for example:
      [ ( \
      Base port Specify the base port from which the system tries to find available ports for use by the notebook. For Zeppelin notebooks, the default base port is 8380. For Jupyter notebooks, this base port setting is ignored. Instead, port 8888 is always used as the default.
      Base data directory Specify the base data directory to store notebook data. By default, the deployment directory for the notebook is used as the base data directory.
      Extra configuration file Optional: Specify the fully qualified path to an extra configuration file for a notebook user (for example, /path_to_file/notebook-conf-${SPARK_EGO_USER}.sh). Use this file to define configuration specific to each notebook user; each user file is sourced during notebook startup.
      Anaconda distribution instance If Anaconda is required for the notebook, specify the Anaconda distribution instance for the notebook package. Optionally, if available, use a linked conda environment.
      Note: When using an existing Anaconda distribution instance in an instance group, the notebook execution user must have read/write access to the Anaconda distribution instance's deployment directory.
      Conda environment If Anaconda is required for the notebook, specify the conda environment to run the notebook.
      Note: When using an existing conda environment in an instance group, the notebook execution user must have read/write access to the environment directory, located in the $DEPLOY_HOME/anaconda/envs/environment_name directory (where $DEPLOY_HOME is the deployment directory for the Anaconda distribution instance).
      SSL Select to enable SSL, which turns on SSL for the web UI part of the notebook. By default, if the instance group is SSL enabled and the notebook supports SSL, the check box is already selected. Clear the check box if you want to disable SSL for the notebook. If Spark version 1.5.2, 2.0.1, or 2.1.0 is selected in the instance group, notebook SSL must be disabled.
      Restriction: If security settings are enforced at the cluster level, you cannot change these settings for the instance group. Talk to your cluster administrator for more information.
      Memory limit If the notebook is Dockerized, specify the memory limit (in MB) for the notebook. If a memory limit is not specified, the notebook uses the memory limit that is specified for the notebook type.
      Data Volumes If the notebook is Dockerized, optionally add data volumes that are mount points for the notebook's Docker container.

      If CONDUCTOR_JUPYTER_DATA_VOL_ENVS_ENABLED=ON in ascd.conf, you can define environment variables (such as /scratch/dev/${SPARK_EGO_USER}) in your host path and container path definitions for Dockerized Jupyter notebooks. Include dollar signs ($), open curly brackets ({), and closed curly brackets (}) in your host path and container path definitions. Ensure, however, that your definition does not start with these characters.

      Environment Variables Add new variables for use by your notebook's service activity scripts.
      Number of GPUs If your cluster is enabled for GPUs, this field shows. Specify a valid integer, 0 or greater, for the number of exclusive GPU slots required by each notebook service.

      By default, the number of GPUs is set to zero. If you set this value to greater than zero, ensure that the resource group selected for this notebook contains GPU hosts. Also, ensure the GPU mode on the GPU hosts is set to exclusive.

    3. Click Save.

What to do next

  1. Finish configuring the instance group. See Defining basic settings for an instance group.
  2. Create and deploy the instance group.
    • Click Create and Deploy Instance Group to create the instance group and deploy its packages simultaneously. In this case, the new instance group appears on the Instance Groups page in the Ready state. Verify your deployment and then start the instance group.
    • Click Create Only to create the instance group but manually deploy its packages later. In this case, the new instance group appears on the Instance Groups page in the Registered state. When you are ready to deploy packages, deploy the instance group and verify the deployment. Then, start the instance group.