How LSF Session Scheduler runs tasks
Once a LSF Session Scheduler session job has been dispatched and starts running, LSF Session Scheduler parses the task definition file specified on the ssched command. Each line of the task definition file is one task. Tasks run on the hosts in the allocation in any order. Dependencies between tasks are not supported.
LSF Session Scheduler status is posted to the LSF Session Scheduler session job through the LSF bpost command. Use bread or bjobs -l to view LSF Session Scheduler status. The status includes the current number of pending, running and completed tasks. LSF administrators can configure how often the status is updated.
When all tasks are completed, the LSF Session Scheduler exits normally.
ssched runs under the submission user account. Any processes it creates, either locally or remotely, also run under the submission user account. LSF Session Scheduler does not require any privileges beyond those normally granted a user.
LSF Session Scheduler job sessions
The LSF Session Scheduler session job is compatible with all currently supported LSF job submission and execution parameters, including pre-execution, post-execution, job-starters, I/O redirection, queue and application profile configuration.
Run limits are interpreted and enforced as normal LSF parallel jobs. Application-level checkpointing is also supported. Job chunking is not relevant to LSF Session Scheduler jobs since a single LSF Session Scheduler session is generally long running and should not be chunked.
If the LSF Session Scheduler session is killed (bkill) or re-queued (brequeue), the LSF Session Scheduler kills all running tasks, execution agents, and any other processes it has started, both local and remote. The session scheduler also cleans up any temporary files created and then exits. If the LSF Session Scheduler is then re-queued and restarted, all tasks are rerun.
If the LSF Session Scheduler session is suspended (bstop), the LSF Session Scheduler and all local and remote components will be stopped until the session is resumed (bresume).
LSF Session Scheduler tasks
ssched and sservice and sschild execution agents ensure that the user submission environment variables are set correctly for each task. In order to minimize the load on the LSF, mbatchd does not have any knowledge of individual tasks.
Task definition file format
[task_options] command [arguments]
Session and task accounting
Jobs corresponding to the LSF Session Scheduler session have one record in lsb.acct. This record represents the aggregate resource usage of all tasks in the allocation.
If task accounting is enabled with SSCHED_ACCT_DIR in lsb.params, Session Scheduler creates task accounting files for each LSF Session Scheduler session job and appends an accounting record to the end of the file. This record follows a similar format to the LSF accounting file lsb.acct format, but with additional fields/
The accounting file is named jobID.ssched.acct. If no directory is specified, accounting records are not written.
The LSF Session Scheduler accounting directory must be accessible and writable from all hosts in the cluster. Each LSF Session Scheduler session (each ssched instance) creates one accounting file. Each file contains one accounting entry for each task. Each completed task index has one line in the file. Each line records the resource usage of one task.
Task accounting file format
Field | Description |
---|---|
Event type (%s) | TASK_FINISH |
Version Number (%s) | 10.1.0 |
Event Time (%d) | Time the event was logged (in seconds since the epoch) |
jobId (%d) | ID for the job |
userId (%d) | UNIX user ID of the submitter |
options (%d) | Always 0 |
numProcessors (%d) | Always 1 |
submitTime (%d) | Task enqueue time |
beginTime (%d) | Always 0 |
termTime (%d) | Always 0 |
startTime (%d) | Task start time |
userName (%s) | User name of the submitter |
queue (%s) | Always empty |
resReq (%s) | Always empty |
dependCond (%s) | Always empty |
preExecCmd (%s) | Task pre-execution command |
fromHost (%s) | Submission host name |
cwd (%s) | Execution host current working directory (up to 4094 characters) |
inFile (%s) | Task input file name (up to 4094 characters) |
outFile (%s) | Task output file name (up to 4094 characters) |
errFile (%s) | Task error output file name (up to 4094 characters) |
jobFile (%s) | Task script file name |
numAskedHosts (%d) | Always 0 |
askedHosts (%s) | Name
of the asked execution host for the task. When numAskedHosts is 0, this value can
be ignored. |
numExHosts (%d) | Always 1 |
execHosts (%s) | Name of task execution host |
jStatus (%d) | 64 indicates task completed normally. 32 indicates task exited abnormally |
hostFactor (%f) | CPU factor of the task execution host |
jobName (%s) | Always empty |
command (%s) | Complete batch task command specified by the user (up to 4094 characters) |
lsfRusage (%f) | All
rusage fields contain resource usage information for the task. The resource usage
information is similar to lsfRuage logged in the lsf.acct
file; the difference is that some fields use the %1.0f format. |
mailUser (%s) | Always empty |
projectName (%s) | Always empty |
exitStatus (%d) | UNIX exit status of the task |
maxNumProcessors (%d) | Always 1 |
loginShell (%s) | Always empty |
timeEvent (%s) | Always empty |
idx (%d) | Session job index |
maxRMem (%d) | Always 0 |
maxRSwap (%d) | Always 0 |
inFileSpool (%s) | Always empty |
commandSpool (%s) | Always empty |
rsvId (%s) | Always empty |
sla (%s) | Always empty |
exceptMask (%d) | Always 0 |
additionalInfo (%s) | Always empty |
exitInfo (%d) | Always 0 |
warningAction (%s) | Always empty |
warningTimePeriod (%d) | Always 0 |
chargedSAAP (%s) | Always empty |
licenseProject (%s) | Always empty |
app (%s) | Always empty |
taskID (%d) | Task ID |
taskIdx (%d) | Task index |
taskName (%s) | Task name |
taskOptions (%d) | Bit mask of task options:
|
taskExitReason (%d) | Task exit reason:
|