Share policies for multidimensional scheduling

There are two sharing models for multidimensional scheduling: priority sharing and proportional sharing. A consumer participates in both sharing models at the same time.

  • Priority sharing: Each consumer can be configured with sharing priority. The priority is only comparable among sibling consumers under the same parent. Consumers with high share priority get all the available resources from the share pool before low priority consumers can get any resource. Consumers with the same priority share resources proportionally according to their share ratio.
  • Proportional sharing: Each consumer can be configured with share ratio. Consumers with the same share priority will share resources proportionally according to the share ratio.

Fair sharing

EGO guarantees sharing fairness among consumer siblings at each level of the consumer tree based on the configured share priority and share ratio of the consumers. The private share pool maintains fairness separately from the public share pool and each private pool maintains fairness separately from other private share pools. Reclaim is enforced as required to maintained fairness. You can define a grace period for a public share pool at the plan level. You can also define a grace period for a private share pool at the consumer level, which is configured as PrivatePoolGracePeriod in the <OwnershipPolicy> element of $EGO_CONFDIR/MDPlan.xml.

Share quota

In the share ratio policy for slot scheduling, the static share quota defines how many slots the consumer deserves if it has enough workload, regardless of the workload of other consumers. With the multidimensional scheduling policy, because of the diversity of the demand's dimensions and the run time of packing results, it is not a guaranteed number, but rather only a reference number. For each resource dimension that can be scheduled by the resource plan, the static share quota is calculated in a similar way to the slot policy for the share quota of the consumer. The static share quota of the consumer is the vector of quotas of all resource dimensions.

In the dynamic share ratio policy for slot scheduling, the share quota defines the deserved resource of the consumer under the current workload. For multidimensional policy, the deserved resources are packing dependent due to fragmentation and non-uniform resource consumption from different consumers and allocations, so share quota is a number that is only approximate.

Share usage quota

Share usage quota is defined as how much share usage each consumer deserves according to the configured share ratio.

Share usage for each consumer is defined as:

(allocated units)/(capacity of share pool)

where:

The capacity of the share pool is based on the size of the requested unit. Capacity is the maximum units of the given size that can be placed in the share pool if the whole share pool is free.

Characteristics of share usage:
  • Each sub-allocation has a different unit size and has its own share usage.
  • Each allocation's share usage is the sum of share usage of its sub-allocations.
  • For leaf consumers, share usage is the sum of share usage of all of its allocations.
  • For non-leaf consumers, share usage is the sum of share usage of its children.
Share usage quota is calculated as follows:
  • Each sub-allocation has a different unit size and has its own share usage.
  • Each allocation's share usage is the sum of share usage of its sub-allocations.
  • For leaf consumers, share usage is the sum of share usage of all of its allocations.
  • For non-leaf consumers, share usage is the sum of share usage of its children.
  • If there is no outstanding demand, share usage of the consumer is equivalent to the share usage quota. An exception to this rule is when the consumer is using more than its share limit. In this case, resources are reclaimed from this consumer first and the share usage becomes the share usage quota.
  • If there is outstanding demand:
    1. Calculate the total share usage of the share pool to be the share usage sum of all consumers.
    2. Calculate the usage demand of each sub-allocation as:

      number of demanded units/capacity of share pool

      If the result is greater than 1, set it to 1. For parent consumers, calculate the usage demand to be the sum of its children's demands.

    3. Calculate the share usage quota of each consumer. (The approach is similar to how share quota in the slot policy is calculated, only in this case the starting value is total share usage of the share pool instead of total slots in the share pool.) Starting from the root consumer, traverse all priorities from highest to lowest, and for each priority, traverse all consumers and distribute the share usage quota to each consumer proportionally to their share ratio for that priority level. If the distributed value is more than the share usage demand of the consumer, only distribute the share usage quota up to meet the demand and allow other consumers to share it again. For each middle consumer, repeat the assignment process until all leaf consumer get their share usage quota.
For example, assume the total usage (sum of usage of all consumers) in the following consumer tree is 1:
Example multidimensional scheduling share usage quota tree

If consumer A and consumer B both have enough demand, consumer A's usage quota will be 0.3333, and consumer B's usage quota will be 0.6666. Consumer C's quota is 0. Moving down one level in the tree, consumer A1's usage quota will be 0.11111 and consumer A2's usage quota will be 0.2222. Consumer B1's usage quota will be 0.6666, which is the same as the usage quota of its parent, and consumer B2's usage quota will be 0 because it has a low share priority compared to consumer B1.

How free resources are assigned

Unallocated resources in share pools are assigned to consumers as follows:
  1. Calculate the percentage of a parent's capacity that should be distributed to each child.
    1. Remove all nodes that are satisfied from the consumer tree.
    2. Starting from the root, apply the following recursive algorithm:
      • If children of the current node are high-priority nodes, set the percentage of the highest-priority child to 100% and all others to 0%.
      • If children of the current node are share nodes, set the percentage to [(child's configured share / SUM((sibling's configured share) + (child's configured share))]
  2. Assign resources to the most under-used node.
    1. If children of the current node are high priority nodes, go to the highest-priority child.
    2. If children of the current node are share nodes, go to the most under-used child. The most under-used child is the node with the smallest (share usage / share ratio).
    3. Once a leaf node is reached, try to assign a free resource to it.
      • If the free resource cannot be assigned to the node, remove the node from the consumer tree.
      • If the node is now satisfied, remove the node from the consumer tree.
    4. Update the share usage of the parent node as needed and repeat the resource distribution process until the consumer tree is empty.