Creating notebook packages

Create the notebook packages that contain the components that are required to run a notebook. This task is not required for the built-in notebooks. All the files that are required to run the built-in notebooks are installed with IBM® Spectrum Conductor.

Before you begin

If you have a local environment with a mixed cluster that uses both Linux and Linux on POWER, the Jupyter notebook packages for Linux must be in a different resource group than the ones for Linux on Power, since they are different.

About this task

Create the package for a notebook, which you can add to IBM Spectrum Conductor and make available to a instance group.
Creating a package involves bundling all the files that the notebook requires to run. Ensure that the package size does not exceed 4 GB and that it uses only one of the following supported formats:
  • .zip
  • .tar
  • .taz
  • .tar.zip
  • .tar.Z
  • .tar.gz
  • .tgz
  • .jar
  • .gz
  • .exe

Procedure

  1. Create scripts for the notebook, such as the scripts that are required to start or stop the notebook or for package deployment.
    1. When you are creating your scripts, ensure that all the scripts have execution permission. Also, reference the correct environment variables for your notebook to work. IBM Spectrum Conductor provides the following environment variables for use in your scripts. Note that the environment variables are available for deployment scripts, service scripts, or both. These environment variables are defined when you create the notebook type or overridden when you add a notebook to a instance group:
      Environment variables name and description Can be used in deployment scripts (for deploying or undeploying) Can be used in service scripts (for starting, stopping, and job monitoring)
      ANACONDA_DEPLOY_DIR: Specifies the deployment directory for the Anaconda distribution instance to use. This environment variable is only available and applicable to notebooks using Anaconda. Yes Yes
      ANACONDA_RELATIVE_DIR: Specifies the Anaconda directory relative to the ANACONDA_DEPLOY_DIR. This environment variable is only available and applicable to notebooks using Anaconda. Yes Yes
      ASCD_REST_CACERT_PATH: Specifies the path to the ascd service's certificate authority (CA) certificate, for clusters with SSL enabled. No Yes
      CONDA_ENV_NAME: Specifies the conda environment for the notebook. This environment variable is only available and applicable to notebooks using Anaconda. Yes Yes
      DEPLOY_NB_ADMIN_USER_GROUP: Specifies the administrator user group for deploying the notebook. Takes the value of the Administrator user group field when creating a notebook using the cluster management console, if that field is set. If the field does not have a value, then this DEPLOY_NB_ADMIN_USER_GROUP environment variable is also not set. Yes No
      EGO_MASTER_LIST_PEM: Specifies a space-separated list of management hosts. No Yes (only for services scripts for Dockerized notebooks)
      EGO_REST_URL and CONDUCTOR_REST_URL: Specifies the URLs on which the RESTful APIs are available.
      • EGO_REST_URL: Specifies the URL on which the resource management RESTful APIs are available. This URL is by default https://HOSTNAME:8543/platform/rest/ego/v1.
      • CONDUCTOR_REST_URL: Specifies the URL on which the instance group RESTful APIs are available. This URL is by default https://HOSTNAME:8643/platform/rest/platform/rest/conductor/v1.
        Note: Both these URLs are dynamically generated when notebook services are started. After the services start, take manual steps in the following cases:
        • If failover occurs for the REST and ascd services that manage the APIs, manually restart the notebook services to pick up the new URLs.
        • If you change the port for the REST or ascd services or switch between enabling and disabling SSL, unassign or assign the notebook users to ensure that the CONDUCTOR_REST_URL environment variable references the updated URL.
      No Yes
      IBM_PLATFORM_DEPLOY_HOOK_EXEC_USER: Specifies the notebook execution user who is deploying the notebook. Yes No
      NOTEBOOK_BASE_PORT: Specifies the base port from which the system tries to find available ports for use by the notebook. Yes Yes
      NOTEBOOK_DATA_BASE_DIR: Specifies the top-level directory to store notebook data. Each notebook service then gets a unique directory within this directory, which is the NOTEBOOK_DATA_DIR environment variable available to service scripts. Yes No
      NOTEBOOK_DATA_DIR: Specifies the directory to store notebook data. No Yes
      NOTEBOOK_DEPLOY_DIR: Specifies the directory to which the notebook is deployed. Yes Yes
      NOTEBOOK_EXTRA_CONF_FILE: Specifies a file that contains additional configuration variables or steps required by the notebook. Yes Yes
      NOTEBOOK_SSL_ENABLED: Specifies whether SSL is enabled (that is, set to true) for the notebook. No Yes
      SPARK_EGO_USER: Specifies the user who is assigned to the notebook. No Yes
      SPARK_HOME: Specifies the Spark installation directory. Yes Yes
      SPARK_INSTANCE_GROUP_UUID: Specifies the UUID of the instance group. No Yes
      SPARK_INSTANCE_GROUP_NAME: Specifies the name of the instance group. No Yes
      SPARKMS_HOST: Specifies the name of the host on which the Spark notebook master service is running. This host name is used to construct the Spark master URL in the format spark://HOST:PORT. No Yes

      In addition to these environment variables available to service scripts, when you add a notebook to your cluster or to the instance group, any environment variables that you add to that notebook will be available to service scripts. Therefore, if you have a custom notebook that requires additional environment variables, you can set them when adding a notebook to your cluster or to the instance group, or when configuring each individual assigned notebook.

    2. To perform HTTP calls to the EGO and CONDUCTOR REST servers, use cURL to obtain a CSRF token, which acts as authentication for certain REST calls that require authentication permissions.
      For example, perform a GET REST call to CONDUCTOR_REST_URL and parse the return message to obtain the CSRF token, run:
      curl -XGET -H'Accept: application/json' ${CONDUCTOR_REST_URL}conductor/v1/auth/logon ${tlsVersion}

      The CSRF token can be used as authentication for subsequent POST, PUT, and REMOVE REST calls to avoid logging on multiple times.

  2. Save the scripts in a local directory.
  3. Download the required binaries for the system, which is based on network accessibility of the target environment. Or if applicable, run the scripts to download the binaries.
  4. Collect all the files that are required for the notebook to work, which might include:
    • Third-party binaries, which are based on network accessibility of the target environment.
    • Scripts that describe how to manage the notebook service lifecycle, such as commands to start or stop the service.
    • The deployment configuration file (deployment.xml).
    • The deploy script (deploy.sh). This script is called when a instance group is deployed to help deploy the notebook package onto the client host.
    • The undeploy script (undeploy.sh). This script is called when a instance group is removed to remove the notebook package and all notebook related files from the client host..
  5. Go to the directory where the files are located. For example:
    deployment_dir
      deployment.xml
      scripts/
      package/
  6. Copy the deployment.xml file that is provided at the root of the samples folder.
    Note: The $EGO_CONFDIR/../../ascd/conf/samples/deployment.xml file is only a sample and must be customized for each package.
  7. Create a package_scripts subfolder and copy the scripts to that folder.
  8. Create a package subfolder:
    1. Optional: Copy any sample packages to this folder.
    2. Optional: If hosts are connected to the Internet:
      • Create a server subfolder and copy the server dependency software there.
      • Create an agent subfolder and copy the agent dependency software there.
  9. Generate the package in one of the supported formats. For example:

    tar czvf testconductor.tar.gz deployment.xml package_scripts package

Results

The notebook package that contains the binaries, scripts, and other files that are required by the notebook is created.

What to do next

Add the notebook to the cluster. See Adding notebooks.