Jupyter notebook environment variables

For built-in Jupyter notebooks, these environment variables are supported.

Precedence of environment variable values

Depending on your role and permissions, you can define notebook environment variables in the various ways:
  1. A notebook user can define environment variables for their own notebooks. An administrator can also do this by configuring the notebook for the user.
  2. An administrator can define environment variables when enabling notebooks for an instance group.
  3. A cluster administrator can define environment variables when defining a notebook package.
In the case that a notebook environment variable name is already defined elsewhere, the system uses the value in the order listed here, and makes that the default value for the others. For example if the administrator creates an instance group with a notebook and sets environment variable ABC, and then the notebook user also sets environment variable ABC, the notebook user's values will be used.
Table 1. Environment variables for Jupyter notebooks
Environment variables Description
CONDA_ENV_NAME Allows notebook users to specify a different conda environment than the default value.

Valid values: the name of your conda environment.

JEG_LOG_LEVEL Specifies the log level for the Jupyter Enterprise Gateway server (string).

Valid values: 0, 10, 20, 30, 40, 50, DEBUG, INFO, WARN, ERROR, or CRITICAL.

Default: INFO

JEG_STARTUP_TIMEOUT Specifies the amount of time, in seconds, for the Jupyter start script to wait after launching the Jupyter Enterprise Gateway process, to determine successful or failed startup. This setting is useful as the startup time might be slower on some hosts. If this value is not specified, the default time of 5 seconds is used.

Valid values: any positive integer.

Default: 5

JUPYTER_IP_BLOCKLIST A comma-separated list of local IPv4 addresses (or regular expressions) that are not to be used when determining the response address that is used to convey connection information back to the Jupyter Enterprise Gateway from a remote kernel. In some cases, other network interfaces (for example Docker with 172.17.0.*) can interfere, which leads to connection failures during kernel startup.

Valid values: a single IPv4 address or a comma-separated list of IPv4 addresses. Both entries can contain a wildcard character.

Example: 172.17.0.*,192.168.0.27, which eliminates the use of all addresses that start with 172.17.0, as well as the single IPv4 address 192.168.0.27.

JUPYTER_CULL_BUSY Specifies whether to cull a kernel when it is busy, regardless of whether it is running cells currently or not. Specify True to kill the kernel. Generally, if you enable to cull busy kernels, you will want to wait a long time before culling (for example, to allow for killing kernels that are running too long).

Valid values: True or False.

Default: False

JUPYTER_CULL_CONNECTED Specifies whether to cull a kernel if a connection to the kernel exists. This option allows you to kill kernels where an active browser is connected to them. Specify True to cull the kernel. Valid values are True or False.

By default this is set to False, so that if you leave your Jupyter browser window open, then the notebooks won't be culled (assuming that your machine with the open browser maintains an active internet connection). Otherwise, set this to True to cull kernels that have an active browser window open.

Valid values: True or False.

Default: False

JUPYTER_CULL_IDLE_TIMEOUT Specifies the amount of time, in seconds, to wait in before culling a kernel. Specify a positive integer. The minimum allowed value is 3600 seconds (or one hour). To disable culling, set this environment variable to 0.

Valid values: any positive integer greater than 1. The minimum allowed value is 1.

Default: 3600

JUPYTER_CULL_INTERVAL Specifies the time interval, in seconds, to query the JUPYTER_CULL_IDLE_TIMEOUT environment variable value. Use this setting to determine how often to check if a kernel should be culled. Specify a positive integer. By default, this interval is set to 600 seconds (or 10 minutes). Note that when a kernel is culled, the entire application is stopped. When you return to your notebook and relaunch the file and kernel, it will be new; anything saved will need to be rerun.

Valid values: any positive integer.

Default: 600

JUPYTER_ENV_ALLOWLIST Specifies a list of environment variables to pass from the kernel launching process into the kernel itself. Specify this value by using a comma-delimited list of environment variable names wrapped in single quotation marks, such as 'VAR1','VAR2'. The variables themselves can be either specified by the notebook user, or can be environment variables that are automatically set when the notebook user logs into the operating system. When specified, the notebook user will have access to these environment variables in the Jupyter GUI cells, whereas they would otherwise only be available in Jupyter terminals.

Valid values: a comma-separated list of environment variables.

Example: 'VAR1','VAR2'

JUPYTER_KERNEL_START_TIMEOUT Specifies the amount of time, in seconds, until the kernel times out. Specify a positive integer.

Valid values: any positive integer.

Default: 300

JUPYTER_REQUEST_TIMEOUT Specifies the amount of time, in seconds, to wait until the kernel errors. Specify a positive integer.

Valid values: any positive integer.

Default: 400

JUPYTER_SPARK_OPTS Specifies additional Spark parameters (such as priority) to be used in either the Spark submit command if you enable a notebook for a instance group, or in the kernel startup if you configure a notebook package.

Example: JUPYTER_SPARK_OPTS = "--conf spark.ego.priority=3000", which specifies that after starting the kernel, the notebook application has priory of 3000 instead of the default 5000.

JUPYTER_USER_SPARK_OPTS Specifies additional notebook user parameters (such as your principal and the location of your keytab file for your notebook. The system can then use this information when starting the Kerberos authenticated notebook using a service-level impersonation user).

Notebook users can add this environment variable to the notebooks that they own, and the notebook user's value takes precedence if the same environment variable name is defined elsewhere.

This environment variable uses the same format as the JUPYTER_SPARK_OPTS parameter. For example:
JUPYTER_USER_SPARK_OPTS = "--conf spark.yarn.principal=user@realm --conf spark.yarn.keytab=/path_to_user.keytab"
such as:
"--conf spark.yarn.principal=jsmith@example.com --conf spark.yarn.keytab=/home/mykeytabdirectory/jsmith.keytab"
which specifies that the system can start the Kerberos authenticated notebook using a service-level impersonation user.
JUPYTERLAB_ENABLED Toggles to use the JupyterLab web based interface instead of the default Jupyter notebook interface.

Valid values: true or false.

NOTEBOOK_EXTRA_CONF_FILE The path to an extra configuration file that runs a notebook at start time. The path can define extra customization to the environment in which the notebook starts.
Tip: If you want to customize environment variables with the script defined in the path of the NOTEBOOK_EXTRA_CONF_FILE, consider the environment variables that might already have values as part of the process environment in which the service runs. You might want to append the existing environment variables where applicable, rather than completely overwrite them. For example, if your instance group contains data connectors, the notebook service automatically have a value for the JUPYTER_SPARK_OPTS environment variable that contains configuration for the data connectors.
To see the current environment variable list for your notebook services:
  1. Open the cluster management console in a web browser. For details, see Locating the cluster management console topic.
  2. Select Workload > Instance Groups.
  3. Under the Name column, click an instance group.
  4. Select the Notebooks tab of the instance group.
  5. Select a notebook and click View Service Details.
  6. Select the Service Profile tab.
  7. Under the ego:ActivitySpecification section, for the list of service environment variables see the ego:EnvironmentVariable entries.
NOTEBOOK_SPARK_PYSPARK_PYTHON Specifies the path to the Python executable for the notebook.

The built-in Jupyter start script consumes this NOTEBOOK_SPARK_PYSPARK_PYTHON environment variable. For custom notebooks, use this environment variable name in your start scripts.

For Dockerized notebooks (Adding Dockerized notebooks), if the default value is not used and you want to run notebook applications in client mode, manually add a data volume to your notebook configuration to mount the non-default path.

For notebooks using Anaconda or Miniconda, this environment variable is automatically set to the path to your Python location in the conda environment bin directory for the notebook.

If you are using Python that is not located inside the conda environment bin directory, set the NOTEBOOK_SPARK_PYSPARK_PYTHON environment variable in the Conda environment field when configuring the notebook for your instance group (Enabling notebooks for an instance group).

NOTEBOOK_SPARK_R_COMMAND Specifies the path to the executable for running R scripts in the notebook.

The built-in Jupyter start script consumes this NOTEBOOK_SPARK_R_COMMAND environment variable. For custom notebooks, use this environment variable names in your start scripts.

For Dockerized notebooks (Adding Dockerized notebooks), if the default value is not used and you want to run notebook applications in client mode, manually add a data volume to your notebook configuration to mount the non-default path.

For notebooks using Anaconda or Miniconda, this environment variable is automatically set to the path to your R script location in the conda environment bin directory for the notebook.

If you are using R script that is not located inside the conda environment bin directory, set this NOTEBOOK_SPARK_R_COMMAND environment variable in the Conda environment field when configuring the notebook for your instance group (Enabling notebooks for an instance group).

SITE_PACKAGE_PATH Speeds up the process of the Jupyter start script. To use this setting, indicate the path to the notebook user's conda environment's site-packages directory directly, rather than have the start script run a find command to locate it.

This environment variable is useful on file systems where the find command can take a very long time (such as in very large conda environments, or in slower file systems).

Use this setting when the path to the site-packages directory is known; otherwise, use the SITE_PACKAGE_FIND_DEPTH environment variable.

SITE_PACKAGE_FIND_DEPTH Speeds up the process of the Jupyter start script. To use this setting, indicate the maximum depth for the find command when searching for the conda environment's site-packages directory.

This environment variable is useful on file systems where the find command can take a very long time (such as in very large conda environments, or in slower file systems).

Use this setting when the path to the site-packages directory is unknown; otherwise, use the SITE_PACKAGE_PATH environment variable.