Table of contents

Spectrum environments

Service Execution Engine for Apache Hadoop environments are not available by default. An administrator must install the Execution Engine for Apache Hadoop service on the IBM Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.

Use a remote system environment with a configured Spectrum Conductor system when you want to create a Jupyter notebook that leverages the kernel / and Spark that is on a Spectrum Conductor cluster.

Spectrum environment definitions

To create a remote system environment with a configured Spectrum system environment definition:

  1. From the Environments tab in your project, click New environment definition.
  2. Enter a name and a description.
  3. Select the Remove system environment configuration type.
  4. Select one of the configured systems.

    Note: The field label changes. If you select a Hadoop system, the label Hadoop configuration is displayed. If you select a Spectrum Conductor cluster, the label Systems configuration is displayed.

  5. Select an option from the Instance group for the current Cloud Pak for Data user. If the list is not available, contact your Spectrum Conductor administrator and determine whether you have access to the specific configured cluster.
  6. Select the Software version. The options determine which Anaconda environment you will use to instantiate your Jupyter Enterprise Gateway (JEG) session. The base environment has more than 90 packages available. The Cloud Pak for Data admin can add additional environments through the “push” operation from the Registration UI.
  7. Select the size of the environment that you’ll be running your notebook or notebook jobs with.
  8. After you complete these steps, you can create a notebook using this environment, and run your JEG session on the Spectrum cluster remotely.

Adding user settings

When you need to fine tune your Spark session, use the User defined session variables in the environment definitions page. The variables are parameters that help you define additional Spark options that can be used as part of your notebook launch or executing a job.

Before you can use the variables, your Spectrum Conductor admin must first define the list of available options and the value range for the options as part of configuring Spectrum. Contact your Spectrum Conductor admin to learn what options are available for you to configure. After you add the new options, it takes effect after you launch a new notebook or run a job.

To add new parameters to your Spectrum Conductor environment definition:

  1. In the User defined session variables section, click New session variable.
  2. Select the parameters and values.