Best practices for resource group configuration

Technical Blog Post

Abstract

Body

IBM Spectrum Conductor with Spark utilizes the resource orchestrator, also known as the enterprise grid orchestrator (EGO) for resource allocation and management. The resource orchestrator manages the supply and distribution of resources, making them available to applications.

During the process of creating a Spark instance group, you can use existing consumers, or use the default consumers that are created automatically during Spark instance group creation to run services and Spark workload. The default resource setting provides a quick start for you to get hands-on experience with the product. Alternatively, you can configure your own resource settings to fit your own use cases.

Why are unwanted reclaims happening?

Resource competition and unwanted resource reclaims happen when you execute more tasks and services than the available resources. In order to avoid this behavior, you can configure the resource rules to increase restrictions on resource allocation. IBM Spectrum Conductor with Spark resources are distributed based on the share ratio set for the consumers. In the image below, the default consumers are created with a share ratio of 1:1, which means they have an equal opportunity of obtaining resources.

The following image shows the default resource plan configuration:

If you overload the Spark instance group with batch applications that exceed the total resources offered for that consumer when there are no available resources to execute on, the newly submitted batch applications could forcibly obtain a slot from an existing running batch job.

Similar resource competition will occur between services. For example, shuffle services, notebook master services, and batch master services use the spark-sparkapp consumer, meaning all of their resource requests will be resolved within the spark-sparkapp consumer. If all the resources are consumed in spark-sparkapp, and a new Spark instance group is created also using spark-sparkapp, the new Spark instance group’s services could forcibly take the resource slot of the prior running Spark instance group.

How can this be avoided?

There are several ways to avoid resource competition. You can utilize resource group configuration in combination with resource plan configuration to refine the resource allocation rules. Refer to the three options below:

Option 1 – During Spark instance group creation, you can configure the resource group usage of each field by configuring each field to a different resource group. This can prevent resource competition between each field because the resources used only reside in each specified resource group. This solution is sufficient to solve some issues, however, in the image below, Spark executors and Spark shuffle service are still using the same resource group. As a result, slot competition can potentially still occur between them.

Option 2 – While the solution above is sufficient to solve some resource competition issues between services within the same Spark instance group, you can utilize the resource plan to refine the resource allocation rules by creating a hierarchical consumer tree.

By splitting down the consumers, you can choose the appropriate consumer for each service and workload during Spark instance group creation in order to prevent resource competition between fields using the same consumer. For example, the Spark executor competes with the Spark shuffle service as mentioned above.

After the Spark instance group is created using the consumer hierarchy, you can further refine the resource rules by enabling Borrow Only Consumer. A Borrow Only Consumer is a consumer without guaranteed slots (share ratio = 0), which gives them lower priority when competing for resources with a normal consumer sibling. To enable the Borrow Only Consumer configuration, navigate to $EGO_CONFDIR/ego.conf and append the following line in the ego.conf file.

After enabling Borrow Only Consumer, navigate to the resource plan to configure the share ratio between each consumer. By setting the share ratio to ‘0’ for the consumer, you can enable Borrow Only Consumer for that specific consumer. This flexible resource management configuration enhances scaled resource allocation by borrowing unused resources from Borrow Only Consumers for consumers that require more resources to complete their tasks at hand.

Option 3 – Option 2 solves most of the concerns of resource competition, however, this partial solution may cause resource starvation. This means that high priority consumers borrow and use all of the available resources, and low priority consumers might not have enough resources to execute their own tasks. You can resolve resource starvation by assigning owned slots to specific consumers. Owned slots cannot be borrowed by any other consumer, allowing consumers with owned slots to guarantee that they always have those owned resources to run their tasks. For example, in the image below, Executor is a Borrow Only Consumer. If owned slots are not set, all Executor slots can potentially be borrowed by high priority consumers, and no slots will be left for the executors to run, resulting in hanging batch applications. Assigning at least one owned slot to the Executor consumer will prevent resource starvation for Spark executors.

The application of these best practices are circumstantial, depending on the usage of the Spark instance group and the tasks that are being submitted. You can use combinations of the best practices mentioned in this blog for a better user experience.

If you have questions or feedback about this feature or IBM Spectrum Conductor with Spark, you can contact us through our forum! For more information on resource plan configuration, see Resources in the IBM Spectrum Conductor with Spark Knowledge Center.

Download an evaluation version of IBM Spectrum Conductor with Spark today from our Service Management Connect page!

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS4H63","label":"IBM Spectrum Conductor"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16163791

IBM Support

Tips

Best practices for resource group configuration

Technical Blog Post

Abstract

Body

How can this be avoided?

UID

Share your feedback

Need support?