Service classes for SLA scheduling
Configure service classes in
LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses. Each service class is
defined in a ServiceClass section.
Begin ServiceClass and ends with
the line End ServiceClass. You must specify: - A service class name
- At least one goal (deadline, throughput, or velocity) and a time window when the goal is active
- A service class priority
All other parameters are optional. You can configure as many service class sections as you need.
User groups for service classes
You can control access to the SLA by configuring a user group for the service class. If LSF user groups are specified in lsb.users, each user in the group can submit jobs to this service class. If a group contains a subgroup, the service class policy applies to each member in the subgroup recursively. The group can define fair share among its members, and the SLA defined by the service class enforces the fair share policy among the users in the user group configured for the SLA.
Service class priority
A higher value indicates a higher priority, relative to other service classes. Similar to queue priority, service classes access the cluster resources in priority order. LSF schedules jobs from one service class at a time, starting with the highest-priority service class. If multiple service classes have the same priority, LSF runs all the jobs from these service classes in first-come, first-served order.
Service class priority in LSF is completely independent of the UNIX scheduler's priority system for time-sharing processes. In LSF, the NICE parameter is used to set the UNIX time-sharing priority for batch jobs.
Any guaranteed resources remaining idle at the end of a scheduling session may be loaned to jobs if loaning is enabled in the guaranteed resource pool (lsb.resources).
Service class configuration examples
The service class Uclulet defines one deadline goal that is active during working hours between 8:30 AM and 4:00 PM. All jobs in the service class should complete by the end of the specified time window. Outside of this time window, the SLA is inactive and jobs are scheduled without any goal being enforced:
Begin ServiceClassNAME = UcluletPRIORITY = 20GOALS = [DEADLINE timeWindow (8:30-16:00)]DESCRIPTION = "working hours"End ServiceClassThe service class Nanaimo defines a deadline goal that is active during the weekends and at nights:
Begin ServiceClassNAME = NanaimoPRIORITY = 20GOALS = [DEADLINE timeWindow (5:18:00-1:8:30 20:00-8:30)]DESCRIPTION = "weekend nighttime regression tests"End ServiceClassThe service class Inuvik defines a throughput goal of 6 jobs per hour that is always active:
Begin ServiceClassNAME = InuvikPRIORITY = 20GOALS = [THROUGHPUT 6 timeWindow ()]DESCRIPTION = "constant throughput"End ServiceClassTo configure a time window that is always open, use the timeWindow keyword with empty parentheses.
The service class Tofino defines two velocity goals in a 24 hour period. The first goal is to have a maximum of 10 concurrently running jobs during business hours (9:00 AM to 5:00 PM). The second goal is a maximum of 30 concurrently running jobs during off-hours (5:30 PM to 8:30 AM):
Begin ServiceClassNAME = TofinoPRIORITY = 20GOALS = [VELOCITY 10 timeWindow (9:00-17:00)] \[VELOCITY 30 timeWindow (17:30-8:30)]DESCRIPTION = "day and night velocity"End ServiceClassThe service class Kyuquot defines a velocity goal that is active during working hours (9:00 AM to 5:30 PM) and a deadline goal that is active during off-hours (5:30 PM to 9:00 AM) Only users
user1anduser2can submit jobs to this service class:Begin ServiceClassNAME = KyuquotPRIORITY = 23GOALS = [VELOCITY 8 timeWindow (9:00-17:30)] \[DEADLINE timeWindow (17:30-9:00)]DESCRIPTION = "Daytime/Nighttime SLA"End ServiceClassThe service class Tevere defines a combination similar to Kyuquot, but with a deadline goal that takes effect overnight and on weekends. During the working hours in weekdays the velocity goal favors a mix of short and medium jobs:
Begin ServiceClassNAME = TeverePRIORITY = 20GOALS = [VELOCITY 100 timeWindow (9:00-17:00)] \[DEADLINE timeWindow (17:30-8:30 5:17:30-1:8:30)]DESCRIPTION = "nine to five"End ServiceClass
When an SLA is missing its goal
Use the CONTROL_ACTION parameter in your service class to configure an action to be run if the SLA goal is delayed for a specified number of minutes.
CONTROL_ACTION=VIOLATION_PERIOD[minutes] CMD [action]
If the SLA goal is delayed for longer than VIOLATION_PERIOD, the action specified by CMD is invoked. The violation period is reset and the action runs again if the SLA is still active when the violation period expires again. If the SLA has multiple active goals that are in violation, the action is run for each of them. For example:
CONTROL_ACTION=VIOLATION_PERIOD[10] CMD [echo `date`:
SLA is in violation >> ! /tmp/sla_violation.log]
SLA policies: preemption, chunk jobs and statistics files
- SLA jobs cannot be preempted. You should avoid running jobs belonging to an SLA in low priority queues.
- SLA jobs will not get chunked. You should avoid submitting SLA jobs to a chunk job queue.
- Each active SLA goal generates a statistics file for monitoring and analyzing the system. When
the goal becomes inactive the file is no longer updated. The files are created in the
LSB_SHAREDIR/cluster_name/logdir/SLA directory. Each file name consists of the
name of the service class and the goal type.
For example the file named Quadra.deadline is created for the deadline goal of the service class name
Quadra. The following file named Tofino.velocity refers to a velocity goal of the service class namedTofino:
% cat Tofino.velocity
# service class Tofino velocity, NJOBS, NPEND (NRUN + NSSUSP + NUSUSP), (NDONE + NEXIT)
17/9 15:7:34 1063782454 2 0 0 0 0
17/9 15:8:34 1063782514 2 0 0 0 0
17/9 15:9:34 1063782574 2 0 0 0 0
# service class Tofino velocity, NJOBS, NPEND (NRUN + NSSUSP + NUSUSP), (NDONE + NEXIT)
17/9 15:10:10 1063782610 2 0 0 0 0