Grace periods
There are two grace periods that can be configured for the multidimensional scheduling policy: reclaim and Lendout.
Reclaim grace period
The reclaim grace period of the borrowing consumer takes effect when the resource that the consumer is using is being reclaimed. The reclaim grace period is defined for each consumer globally, independent of the resource plan.
- Configure the SPARK_EGO_RECLAIM_GRACE_PERIOD parameter during Spark instance group configuration. The value must be higher than the length of time to complete the workload.
- Configure the reclaim grace period in the consumer properties for the consumer that the Spark instance group belongs to. The value must be higher than the length of time to complete the workload.
Lendout grace period
Share pool grace period
Grace periods can be defined for both private (local) share pools and public share pools. Any consumer using the share pool must have a reclaim grace period that is equal to or less than the share pool grace period. A private share pool can be used by the descendants of the node and other consumers but other consumers can only use the share pool if it lends resources out, in which case, the Lendout grace period takes effect and the share pool grace period is not considered.
The grace period for private share pools is defined at the consumer level by PrivatePoolGracePeriod. The grace period for public share pools is defined at the resource plan level by DefaultGracePeriod; this grace period also applies to consumers that do not have their own grace period defined and cannot inherit it from their parent (their parent also does not have a grace period defined).
Configure the grace period for private share pools
The grace period for private share pools is defined by PrivatePoolGracePeriod in the OwnershipPolicy element for each consumer in MDPlan.xml. To configure the grace period for private share pools, edit the MDPlan.xml file at $EGO_CONFDIR, as follows:
<OwnershipPolicy PrivatePoolGracePeriod="100">
<ResourceGroup Name="MDSHosts" PreferenceLevel="5" GetFreeFromNextLevelBeforeReclaim="Y">
<HostSelection Type="NumHosts">
<NumHosts Type="absolute">0</NumHosts>
</HostSelection>
<Lendout GracePeriod="10"/>
</ResourceGroup>
</OwnershipPolicy>
...
The value for the grace period is expressed in seconds. If the grace period is not configured, the consumer will inherit the grace period of its parent, or inherit the grace period of the root node and use the DefaultGracePeriod.
Configuring a default Lendout grace period
<DistributionTree DistributionTreeName="MD1" Type="MDPlan">
<ResourceGroupName> RG1 </ResourceGroupName>
<ResourceGroupName> RG2 </ResourceGroupName>
<ConsumableResources>
<ConsumableResource Optional="true" DefaultConsumption="0">cpu</ConsumableResource>
<ConsumableResource>mem</ConsumableResource>
</ConsumableResources>
<PolicyParameter name="DefaultGracePeriod"> 300 </PolicyParameter>
...
The value for the grace period is expressed in seconds. If the default grace period is not configured, the system default is 0.