Understanding resource groups
Resource groups are logical groups of hosts that help organize a heterogeneous resource pool. Resource groups provide a simple way of organizing and grouping resources (hosts). Instead of creating policies for individual resources, you can create and apply them to an entire group. Groups can be made of resources that are explicitly listed by name or those that satisfy a specific static requirement in terms of operating system, memory, swap space, CPU factor, and so on.
- Defining multiple resource groups
- A major benefit in defining resource groups is the flexibility to group your resources based on
attributes that you specify. For example, if you run workload units or
use applications that need a Linux® operating system with not less than
1000 MB of maximum memory, you can create a resource group that only includes resources meeting
those requirements.Note: Ensure that your hosts do not overlap resource groups. Overlaps cause the hosts to be double-counted (or more) in the resource plan, resulting in recurring underallocation of some consumers.
- Configuring a resource plan for individual resource groups
- Tailoring the resource plan for each resource group requires you to complete several steps. These steps include adding the resource group to each desired frist level consumer (thereby making the resource group available for sub-consumers), along with configuring ownership, enabling lending or borrowing, specifying share limits and share ratio, and assigning a consumer rank within the resource plan.
- Resource groups that include compute hosts with certain identifiable attributes that a consumer may require in a requested resource; for example, resources with large amounts of memory. These resources are considered dynamic to indicate new hosts added to the cluster that meet the requirements and which are automatically added to the resource group.
- Resource groups that only include certain compute hosts; for example, so that specified resources are accessed by approved consumers. These resources are considered static to indicate any new hosts added to the cluster that have to be manually added to the resource group.
- Resource groups that only encompass management hosts (reserved for running services, not a
distributed workload); for example, the predefined
ManagementHosts
group).
CPU slots
When you create a resource group in the cluster management console, you must decide how many slots are to be assigned to each host. The assignment of slots to hosts is a critical function that serves to match host resources with the expected workload. If a host is too heavily loaded, performance suffers; if it is underutilized, resources are wasted. Generally, as a starting point, one slot per CPU is allocated for each service instance.
Once you configure the number of slots per host, monitor host load using the cluster management console. If an adjustment is required, reconfigure the number of slots per host. If hosts are overburdened, decrease the number of slots that are assigned to them. If hosts are underutilized, add more slots to them.
The cluster management console offers flexibility in slot configuration at the resource group level based on host resources, including the number of processors and cores, and the maximum amount of available memory.
- A 1-to-1 mapping exists between small workload units (for example, a session or a task) and slots.
- If the individual host value of the slots per host differs, it overrides the setting of
x slots per host that you set for the resource group. The
host-level setting overrides the group-level setting.
For example, if there are 10 hosts in a resource group and you choose to set up 5 slots per host using the cluster management console, you would normally expect to see 50 slots listed within the summary section of a resource group's properties. However, if you see a different number showing in the summary (for example, 45), then it means that an administrator has manually overridden the settings for one or more hosts. This individual value overrides the group setting configured in the cluster management console.
In some cases, even if an administrator has not manually changed the value of the slots per host, you may still see an unexpected number in the Member Hosts Summary. This may mean that certain hosts within this particular resource group are double-allocated, meaning they are allocated to more than one resource group. In cases of double-allocation, the sum of the allocated slots displays in the Member Hosts Summary, not the number of slots for this resource group alone. It is advised not to double-allocate slots.
- The value for the number of CPUs per host is automatically detected during installation.
- If you want to change the value of the number of slots per CPU, specify it on the workload management side (outside of EGO).
- The number of slots per host can be defined in the cluster management console as an expression. Here are the expression guidelines:
- All valid resource requirement expressions are supported, for example, (a*b), (a/b), (a+b), where a and b are resource names, or integer or decimal values.
- The resource can be one of the following types of static resources:
- host type defined in the HostType section of the ego.shared file. The type is evaluated to 1 if it matches the host, otherwise 0.
- nprocs
- ncores
- maxmem
Default resource groups
By default, EGO is configured with three resource groups: InternalResourceGroup, ManagementHosts, and ComputeHosts.
InternalResourceGroup and ManagementHosts must be untouched, but ComputeHosts can be kept, modified, or deleted if required.
Lost_and_found resource group
When host slots are allocated to a client, the resource manager (VEMKD) detects the resource group to which the host belongs. But when the VEMKD process restarts, there is a brief interval (while host information is updated) where it may not immediately detect the host's resource group. During this interval, EGO creates a temporary resource group called LOST_AND_FOUND. The VEMKD adds any host with a current allocation to this resource group if it cannot immediately detect an assigned group. Once VEMKD completes its update of host information and detects the host’s assigned resource group, the host automatically rejoins its resource group.
Similarly, if a host with allocated slots is permanently removed from its resource group (thus never rejoining its original resource group when VEMKD restarts), the VEMKD adds this host to the LOST_AND_FOUND group. It will remain in this group until the cluster administrator frees up the allocation on the host.