Enabling notebooks for an instance group

Notebooks is one of the available components for instance groups. Optionally, enable notebooks for the instance group. By default, an instance group does not include notebooks. To add a notebooks to your system, see Adding notebooks.

Before you begin

Based on your requirements, ensure that you meet the requirements to create an instance group. See Prerequisites for an instance group.

About this task

When you enable a notebook for an instance group, you can use the notebook's graphical user interface to manipulate and visualize data. IBM Spectrum Conductor is installed with the built-in Jupyter notebook.

Procedure

  1. From the cluster management console, click Workload > Instance Groups to view a list of existing instance groups, select the instance group to work with, and click Configure.
  2. By default, the Notebooks tab does not show (unless you changed the INSTANCE_GROUP_DEFAULT_COMPONENTS within the ascd.conf configuration file and specified the specific notebook versions you want to see by default). If you do not see a Notebooks tab, add it: click Add and select Notebooks as the component name).
  3. In the Notebooks tab, select the notebook to use for the instance group. The tab list all notebook types and versions that are enabled on your system.
  4. Enter the base data directory to store notebook data. If a value is not specified, the notebook's deployment directory is used as the base data directory. By default, the notebook is deployed to {DEPLOY_DIR_OF_IG}/{NOTEBOOK_NAME}-{NOTEBOOK_VERSION}, where IG specifies the name of the instance group.

    For production environments, ensure that you change the default base data directory to a shared directory. Without a shared directory, notebooks cannot access data if the notebook was previously started on a different host.

    Note: The instance group environment is periodically cleaned up by the SparkCleanup service. If you specify a non-default location for the notebook base data directory, you must manually clean up that directory when the instance group is removed.
  5. Specify an Anaconda or Miniconda distribution instance. Optionally, if available, use a linked conda environment.
    Note: When using an existing Anaconda or Miniconda distribution instance in an instance group, the notebook execution user must have read/write access to the Anaconda or Miniconda distribution instance's deployment directory. When using an existing conda environment in an instance group, the notebook execution user must have read/write access to the environment directory, located in the $DEPLOY_HOME/anaconda/envs/environment_name directory (where $DEPLOY_HOME is the deployment directory for the Anaconda or Miniconda distribution instance).
  6. Optional: Complete these steps to customize the default notebook configuration:
    1. Click the Configuration link for the notebook.
    2. Define notebook configuration:
      Field Description
      Execution user Specify the execution user for the notebook. By default, the execution user specified for the instance group is used. Consider the following requirements when you specify an execution user:
      • If you specify different execution users for the notebook and the instance group, ensure that both users belong to the same primary OS user group.
      • The instance group execution user should be the same as the Anaconda execution user. Otherwise, you must ensure that the instance group execution user has write permission to the Anaconda deployment directory used for deploying notebooks.
      • If the SPARK_EGO_IMPERSONATION parameter is set to false, the notebook execution user must be the same as the executor consumer execution user or you will receive a permission issue.
      Administrator user group Specify the administrator user group for the notebook. The administrator user group is assigned permission to all the directories and files within the notebook deployment directory tree.

      The specified administrator user group must satisfy that the execution user of the notebook must be a member of the specified administrator user group. If not satisfied, or if the provided administrator user group is not valid, the deployment will fail.

      If you do not provide the administrator user group here, the system assigns the primary user group of the notebook execution OS user to all the directories and files within the notebook deployment directory tree.

      Deployment directory Specify the directory to which the notebook is deployed. By default, the deployment directory is created at {DEPLOY_DIR_OF_IG}/{NOTEBOOK_NAME}-{NOTEBOOK_VERSION}, where IG specifies the name of the instance group.

      Ensure that the notebook deployment directory has sufficient disk space. For Zeppelin notebooks, the recommended disk space is 2 GB.

      For Zeppelin notebooks, the path to the deployment directory can only contain letters (uppercase and lowercase) and numbers. Do not use special characters in the path; for example:
      [ ( \
      Base port Specify the base port from which the system tries to find available ports for use by the notebook. For Zeppelin notebooks, the default base port is 8380. For Jupyter notebooks, this base port setting is ignored. Instead, port 8888 is always used as the default.
      Base data directory Specify the base data directory to store notebook data. By default, the deployment directory for the notebook is used as the base data directory.
      Extra configuration file Optional: Specify the fully qualified path to an extra configuration file for a notebook user (for example, /path_to_file/notebook-conf-${SPARK_EGO_USER}.sh). Use this file to define configuration specific to each notebook user; each user file is sourced during notebook startup.
      Anaconda distribution instance If Anaconda or Miniconda is required for the notebook, specify the Anaconda or Miniconda distribution instance for the notebook package. Optionally, if available, use a linked conda environment.
      Note: When using an existing Anaconda or Miniconda distribution instance in an instance group, the notebook execution user must have read/write access to the Anaconda or Miniconda distribution instance's deployment directory.
      Conda environment If Anaconda or Miniconda is required for the notebook, specify the conda environment to run the notebook.
      Note: When using an existing conda environment in an instance group, the notebook execution user must have read/write access to the environment directory, located in the $DEPLOY_HOME/anaconda/envs/environment_name directory (where $DEPLOY_HOME is the deployment directory for the Anaconda or Miniconda distribution instance).
      SSL Select to enable SSL, which turns on SSL for the web UI part of the notebook. By default, if the instance group is SSL enabled and the notebook supports SSL, the check box is already selected. Clear the check box if you want to disable SSL for the notebook. If Spark version 1.5.2, 2.0.1, or 2.1.0 is selected in the instance group, notebook SSL must be disabled.
      Restriction: If security settings are enforced at the cluster level, you cannot change these settings for the instance group. Talk to your cluster administrator for more information.
      Memory limit If the notebook is Dockerized, specify the memory limit (in MB) for the notebook. If a memory limit is not specified, the notebook uses the memory limit that is specified for the notebook type.
      Data Volumes If the notebook is Dockerized, optionally add data volumes that are mount points for the notebook's Docker container.

      If CONDUCTOR_JUPYTER_DATA_VOL_ENVS_ENABLED=ON in ascd.conf, you can define environment variables (such as /scratch/dev/${SPARK_EGO_USER}) in your host path and container path definitions for Dockerized Jupyter notebooks. Include dollar signs ($), open curly brackets ({), and closed curly brackets (}) in your host path and container path definitions. Ensure, however, that your definition does not start with these characters.

      Environment Variables Add new variables for use by your notebook's service activity scripts.
      Number of GPUs If your cluster is enabled for GPUs, this field shows. Specify a valid integer, 0 or greater, for the number of GPU slots required by each notebook service.

      By default, the number of GPUs is set to zero. If you set this value to greater than zero, ensure that the resource group selected for this notebook contains GPU hosts. Notebooks can use one of two GPU modes: exclusive or default (sharable) mode. Ensure that the GPU mode on the GPU hosts is set appropriately (to either exclusive or to default mode).

      GPU mode If your cluster is enabled for GPUs, this check box shows. Notebooks can use one of two GPU modes: exclusive or default mode. Select this check box to use GPUs in exclusive mode and ensure the GPU mode on the GPU hosts is set to exclusive. If not selected, GPUs will be used in default mode, which is sharable.
    3. Click Close.
  7. Set consumers, and resource groups and plans for the notebook:
    1. Use the Consumers section to select a consumer.
      When you are selecting a consumer for each of the services, you can select one of the following options:
      • Select an existing leaf consumer, select a consumer under the top-level consumer, and click Select.
      • Create a new consumer under the instance group top consumer, enter a consumer name, and click Create.
    2. Use the Resource Groups and Plans section to select the resource group (or multidimensional resource plan) whose hosts provide resources for services in the instance group. The resource groups that are available for selection are those that are associated with the service consumer.
  8. Click Modify Instance Group.