Understanding resource groups

Resource groups are logical groups of hosts that help organize a heterogeneous resource pool. Resource groups provide a simple way of organizing and grouping resources (hosts). Instead of creating policies for individual resources, you can create and apply them to an entire group. Groups can be made of resources that are explicitly listed by name or those that satisfy a specific static requirement in terms of operating system, memory, swap space, CPU factor, and so on.

The cluster administrator can define multiple resource groups, assign them to consumers, and configure a distinct resource plan for each group.
Defining multiple resource groups
A major benefit in defining resource groups is the flexibility to group your resources based on attributes that you specify. For example, if you run workload units or use applications that need a Linux® operating system with not less than 1000 MB of maximum memory, you can create a resource group that only includes resources meeting those requirements.
Note: Ensure that your hosts do not overlap resource groups. Overlaps cause the hosts to be double-counted (or more) in the resource plan, resulting in recurring underallocation of some consumers.
Configuring a resource plan for individual resource groups
Tailoring the resource plan for each resource group requires you to complete several steps. These steps include adding the resource group to each desired frist level consumer (thereby making the resource group available for sub-consumers), along with configuring ownership, enabling lending or borrowing, specifying share limits and share ratio, and assigning a consumer rank within the resource plan.
Resource groups generally fall into one of three categories:
  • Resource groups that include compute hosts with certain identifiable attributes that a consumer may require in a requested resource; for example, resources with large amounts of memory. These resources are considered dynamic to indicate new hosts added to the cluster that meet the requirements and which are automatically added to the resource group.
  • Resource groups that only include certain compute hosts; for example, so that specified resources are accessed by approved consumers. These resources are considered static to indicate any new hosts added to the cluster that have to be manually added to the resource group.
  • Resource groups that only encompass management hosts (reserved for running services, not a distributed workload); for example, the predefined ManagementHosts group).
Resource group host lists are defined by a static list or are dynamic. A static list is where you specifically choose which hosts to use. Dynamically is where you specify either all hosts or a resource requirement so that when new hosts join, they are automatically added to the resource group.
Note: Requirements for resource-aware allocation policies (where only resources that meet specified requirements are allocated to a consumer) can be met by grouping resources with common features and configuring them as special resource groups with their own resource plans.
Tip: Use the egoconfig addresourceattr command to add a custom resource attribute tag to a host and then specify that tag when creating a resource group. See the reference for more information.

CPU slots

When you create a resource group in the cluster management console, you must decide how many slots are to be assigned to each host. The assignment of slots to hosts is a critical function that serves to match host resources with the expected workload. If a host is too heavily loaded, performance suffers; if it is underutilized, resources are wasted. Generally, as a starting point, one slot per CPU is allocated for each service instance.

Once you configure the number of slots per host, monitor host load using the cluster management console. If an adjustment is required, reconfigure the number of slots per host. If hosts are overburdened, decrease the number of slots that are assigned to them. If hosts are underutilized, add more slots to them.

The cluster management console offers flexibility in slot configuration at the resource group level based on host resources, including the number of processors and cores, and the maximum amount of available memory.

Notes:
  • A 1-to-1 mapping exists between small workload units (for example, a session or a task) and slots.
  • If the individual host value of the slots per host differs, it overrides the setting of x slots per host that you set for the resource group. The host-level setting overrides the group-level setting.

    For example, if there are 10 hosts in a resource group and you choose to set up 5 slots per host using the cluster management console, you would normally expect to see 50 slots listed within the summary section of a resource group's properties. However, if you see a different number showing in the summary (for example, 45), then it means that an administrator has manually overridden the settings for one or more hosts. This individual value overrides the group setting configured in the cluster management console.

    In some cases, even if an administrator has not manually changed the value of the slots per host, you may still see an unexpected number in the Member Hosts Summary. This may mean that certain hosts within this particular resource group are double-allocated, meaning they are allocated to more than one resource group. In cases of double-allocation, the sum of the allocated slots displays in the Member Hosts Summary, not the number of slots for this resource group alone. It is advised not to double-allocate slots.

  • The value for the number of CPUs per host is automatically detected during installation.
  • If you want to change the value of the number of slots per CPU, specify it on the workload management side (outside of EGO).
  • The number of slots per host can be defined in the cluster management console as an expression. Here are the expression guidelines:
    1. All valid resource requirement expressions are supported, for example, (a*b), (a/b), (a+b), where a and b are resource names, or integer or decimal values.
    2. The resource can be one of the following types of static resources:
      • host type defined in the HostType section of the ego.shared file. The type is evaluated to 1 if it matches the host, otherwise 0.
      • nprocs
      • ncores
      • maxmem

Default resource groups

By default, EGO is configured with three resource groups: InternalResourceGroup, ManagementHosts, and ComputeHosts.

InternalResourceGroup and ManagementHosts must be untouched, but ComputeHosts can be kept, modified, or deleted if required.

Lost_and_found resource group

When host slots are allocated to a client, the resource manager (VEMKD) detects the resource group to which the host belongs. But when the VEMKD process restarts, there is a brief interval (while host information is updated) where it may not immediately detect the host's resource group. During this interval, EGO creates a temporary resource group called LOST_AND_FOUND. The VEMKD adds any host with a current allocation to this resource group if it cannot immediately detect an assigned group. Once VEMKD completes its update of host information and detects the host’s assigned resource group, the host automatically rejoins its resource group.

Note: This happens only if the host is already allocated and VEMKD must trace its resource group. If the host does not currently belong to an allocation, VEMKD does not search for a resource group.

Similarly, if a host with allocated slots is permanently removed from its resource group (thus never rejoining its original resource group when VEMKD restarts), the VEMKD adds this host to the LOST_AND_FOUND group. It will remain in this group until the cluster administrator frees up the allocation on the host.