Setting consumers and resource groups for an instance group

Optionally, modify the default consumer and resource group that the instance group uses.

Before you begin

  • Based on your requirements, ensure that you meet the requirements to create an instance group. See Prerequisites for an instance group.
  • If you have a local environment with a mixed cluster that uses both Linux and Linux on POWER, the Jupyter notebook packages for Linux must be in a different resource group than the ones for Linux on POWER, since they are different.

About this task

Each instance group is assigned a top-level consumer. When the instance group is created, new consumers are created by default under this top-level consumer for the core components of the instance group: Spark drivers, executors, and batch master. Depending on your configuration, it might also include the shuffle, notebook, and history services.

The default top-level consumer is a consumer with the same name as the instance group (for example, if your instance group name is ABC, then the default top-level consumer is /ABC). The top-level consumer represents the entire cluster and all its resources. The default resource group for an instance group is the ComputeHosts resource group. You can change the default top-level consumer and resource group for the instance group. You can also change the consumer for each component in the instance group. For information on how to configure your own resource settings to avoid resource competition and unwanted resource reclaims, see Best practices for resource group configuration.

When your instances (instance groups, Anaconda distribution instances, and application instances) are deployed to a shared file system, the shuffle service is disabled by default. If you want to use the shuffle service, you must enable the service and set up other configurations. See Enabling and configuring the Spark shuffle service.

Procedure

  1. From the cluster management console, click Workload > Instance Groups to view a list of existing instance groups, select the instance group to work with, and click Configure.
  2. In the Spark tab, and locate the Consumers section.
    Important: When changing the default consumer, take note of the following considerations:
    • The consumers for Spark drivers and Spark executors must not be the same as any other service consumer. If you use existing consumers, the execution user for the Spark drivers consumer and the Spark executors consumer must be the same.
    • If you plan to use exclusive slot allocation, the shuffle service consumer and the Spark executors consumer must be different.
  3. Click the consumer that is specified for each of the services (for example, Spark drivers). When you are selecting a consumer for each of the services, you can select one of the following options:
    • Select an existing leaf consumer, select a consumer under the top-level consumer, and click Select.
    • Create a new consumer under the instance group top consumer, enter a consumer name, and click Create.
  4. Repeat the steps as required for each service component.
  5. To specify a different consumer for executors:
    1. Select Use fair share scheduling for executors. Fair share scheduling enables executors to use a different consumer for each Spark master service to balance workloads across Spark masters.
    2. Specify whether fair share scheduling automatically creates new sub-consumers or if it uses previously created sub-consumers.
  6. Locate the Resource Groups and Plans section and select the resource group (or multidimensional resource plan) whose hosts provide resources for services in the instance group. The resource groups that are available for selection are those that are associated with the service consumer.
    Important: If your configuration prevents services other than the Spark master from running on management hosts (CONDUCTOR_SPARK_RESERVE_MGHOSTS=ON in ascd.conf), do not select a resource group that contains only management hosts for non-Spark master services such as Spark drivers, Spark executors, the Spark shuffle, and notebooks (for example, Zeppelin 0.7.0). The resource group that you select for these services must include one or more compute hosts.
    1. Select the resource group for Spark drivers.
      Note: If you are using notebooks, the resource group for Spark drivers must be the same as or a subset of the notebook resource group.
    2. Select the resource group for Spark executors (CPU slots).
    3. If you want the shuffle service and Spark executors to use different resource groups, select a different resource group for the Spark shuffle service. Both resource groups must use the same set of hosts; otherwise, applications submitted to this instance group fail.
    4. Specify the resource group for the Spark batch master service.
    5. If the instance group includes notebooks, select the resource group for the Sparknotebook master service and for each notebook type.

      When using a Jupyter notebook, if you set different resource groups for the Spark executors and the Jupyter notebook, you must ensure that all hosts in the notebook resource group have Python and all application job-related dependencies installed. Then, edit the Spark configuration to set the Python installation path (for example, $INSTALLDIR/bin/python) for the PYSPARK_PYTHON parameter within the Environment Variables section.

  7. Click Save.
  8. Click Modify Instance Group.

What to do next