Best practices for resource group configuration

When you create an instance group, you can configure your own resource settings to avoid resource competition and unwanted resource reclaims.

The resource orchestrator in IBM® Spectrum Conductor manages the supply and distribution of resources, making them available to applications.

During the instance group creation process, you can use existing consumers or use the default consumers that are created automatically to run Spark workload. The default resource settings provide a quick start for you to get hands-on experience with the product. Alternatively, you can configure your own resource settings to fit your own use cases.

Why do unwanted reclaims occur?

Resource competition and unwanted resource reclaims happen when you run more tasks and services than the available resources. To avoid this behavior, you can configure the resource rules to increase restrictions on resource allocation. IBM Spectrum Conductor resources are distributed based on the share ratio set for the consumers.

For example, you can create an instance group with default consumers that have a share ratio of 1:1, which means they have an equal opportunity of obtaining resources:
New Spark Instance Group EGO Settings
For this example, the default resource plan configuration has a share ratio of 1:1:
Resource plan settings

If you overload the instance group with batch applications that exceed the total resources that are offered for that consumer when there are no available resources to run on, the newly submitted batch applications might forcibly obtain a slot from an existing running batch job.

Similar resource competition occurs between services. For example, shuffle services, notebook master services, and batch master services use the spark-sparkapp consumer, meaning all of their resource requests are resolved within the spark-sparkapp consumer. If all the resources are used in spark-sparkapp, and a new instance group is created that uses spark-sparkapp, the new instance group services might forcibly take the resource slot of the prior running instance group.

How to avoid resource competition

You can avoid resource competition by using the resource group configuration in combination with the resource plan to refine resource allocation rules. Explore the following options to avoid resource competition:

  • Option 1 – During instance group creation, you can configure each instance group component to use a different resource group. This configuration can prevent resource competition between each component because the resources that are used belong only to the specified resource group. This solution can solve some issues; however, Spark executors and the Spark shuffle service might still be using the same resource group. As a result, slot competition could potentially still occur between them.
    Select the resource group or plan that will provide resource for page
  • Option 2 – While Option 1 is sufficient to solve some resource competition issues between services within the same instance group, you can use the resource plan to refine resource allocation rules by creating a hierarchical consumer tree.
    Resource group settings

    By breaking down the consumers, you can choose the appropriate consumer for each service component workload during instance group creation. This configuration prevents resource competition between components that use the same consumer; for example, when Spark executors compete with the Spark shuffle service.

    Once the instance group is created to use a consumer hierarchy, you can further refine resource rules by enabling Borrow Only Consumer. A Borrow Only Consumer is a consumer without guaranteed slots (share ratio = 0), which gives normal sibling consumers higher priority than the borrow only consumer when they compete for resources. To enable the Borrow Only Consumer configuration, navigate to $EGO_CONFDIR/ego.conf and append the following line in the ego.conf file: EGO_ENABLE_BORROW_ONLY_CONSUMER=Y
    # EGO resource allocation policy configuration
    EGO_DISTRIBUTION_INTERNAL=1
    EGO_ADJUST_SHARE_TO_WORKLOAD=Y
    EGO_RECLAIM_FROM_SIBILINGS=Y
    EGO_RBAC_ALLOW_SELFASSIGNMENT=Y
    EGO_VERSION=3.4
    EGO_ENABLE_BORROW_ONLY_CONSUMER=Y
    ~
    ~
    ~
    "ego.conf" 47:, 1235C
    After you enable Borrow Only Consumer, navigate to the resource plan to configure the share ratio between each consumer. By setting the share ratio to 0 for the consumer, you can enable Borrow Only Consumer for that specific consumer. This flexible resource management configuration enhances scaled resource allocation by borrowing unused resources from Borrow Only Consumers for consumers that require more resources to complete their tasks.
    Consumer settings showing share ratio
  • Option 3 – Even though Option 2 solves most resource competition issues, it might cause resource starvation. Resource starvation occurs when high-priority consumers borrow and use all of the available resources, and low-priority consumers do not have enough resources to run their own tasks. You can resolve resource starvation by assigning owned slots to specific consumers. Owned slots cannot be borrowed by any other consumer, guaranteeing consumers with owned slots that they always have those owned resources to run their tasks. For example, the Spark executor is a Borrow Only Consumer. If owned slots are not set, all Spark executor slots can potentially be borrowed by high-priority consumers; no slots remain for the executors to run, resulting in hanging batch applications. Assigning at least one owned slot to the Spark executor consumer prevents resource starvation.
    Consumer settings showing owned slots

The application of these best practices is circumstantial and depends on instance group usage and the tasks that are being submitted. Use combinations of these options when configuring your resource settings.