Controlling reclaim behavior

About this task

Two parameters control when LSF will return and terminate the AWS EC2 instances launched by LSF; LSB_RC_EXTERNAL_HOST_IDLE_TIME and LSB_RC_EXTERNAL_HOST_MAX_TTL. Both define values in minutes. Refer to LSB_RC_EXTERNAL_HOST_IDLE_TIME and LSB_RC_EXTERNAL_HOST_MAX_TTL for the default values.

The following example uses the values of LSB_RC_EXTERNAL_HOST_IDLE_TIME=60 and LSB_RC_EXTERNAL_HOST_MAX_TTL=0. This means that when the AWS EC2 host has no workload for 60 minutes, it will be terminated.

These parameters are applied and effective for the whole LSF cluster.

Example

Combining the examples in this chapter, the following scenario describes and explain the behavior of LSF as jobs are submitted to the LSF cluster.

  1. Initially at t0, (00:00), there are no AWS instances requested since none of the THRESHOLD conditions are met for the priority queue. 20 Jobs are submitted to the priority queue.
  2. At t6, (00:06), there are 11 jobs submitted to the admin queue which belongs to ProjectB. Since the 11 jobs just arrived, it does not meet any of the THRESHOLD conditions for admin queue [[5,10] [1,60] [100,0]].
  3. At t10, (00:10), The same 20 jobs from (1) and 11 jobs from (2) are still pending. The 11 jobs in the admin queue do not meet the THRESHOLD so no bursting is done for the admin queue. The priority queue is not cloud enabled, so no bursting for the priority queue’s 20 jobs.
  4. At t16, (00:16), the same 11 jobs from (2) are still pending, which has a pending time of 10 minutes. The scheduler will allow bursting as the THRESHOLD of [5, 10] is met for the admin queue. How many AWS EC2 instances will be launched is determined next.
  5. The resource connector policy receives the request for 11 hosts and will evaluate the applicable policies. Both policies apply and need to be evaluated.
Example configuration for controlling reclaim behavior

The order does not matter as both policies need to be met. Using GlobalPolicyA1 first, the 11 hosts request first checks MaxNumber which is 100 and it is not exceeded. Next checking the StepValue, it only allows 5 hosts to be launched every 20 minutes, so the 11 hosts request is reduced to 5.

Checking the next policy in scope, ProjectPolicyA2, the MaxNumber is not exceeded as 10 < 50. Next checking the StepValue it allows 10 hosts every 10 minutes, so since 5 < 10 the final number of launched AWS instances is 5. There will be 5 AWS EC2 instances launched at time t16.