blaunch distributed application framework
Most MPI implementations and many distributed applications use rsh and ssh as their task launching mechanism. The blaunch command provides a drop-in replacement for rsh and ssh as a transparent method for launching parallel and distributed applications within LSF.
About the blaunch command
The following figure illustrates blaunch processing:
Similar to the LSF lsrun command, blaunch transparently connects directly to the RES and sbatchd on the remote host, and subsequently creates and tracks the remote tasks, and provides the connection back to LSF. You do not need to insert pam or taskstarter into the rsh or ssh calling sequence, or configure any wrapper scripts.
- rsh host_name command
- ssh host_name command
The host name value for rsh and ssh can only be a single host name, so you can use the -z option to specify a space-delimited list of hosts where tasks are started in parallel. All other rsh and ssh options are silently ignored.
- Only the following signals are supported: SIGKILL, SIGSTOP, SIGCONT.
- The -n option is not supported.
- CMD.EXE /C <user command line> is used as intermediate command shell when -no-shell is not specified
- CMD.EXE /C is not used when -no-shell is specified.
- Windows User Account Control must be configured correctly to run jobs.
LSF APIs for the blaunch distributed application framework
- lsb_launch(): Synchronous API call to allow source level integration with vendor MPI implementations. This API launches the specified command (argv) on the remote nodes in parallel. LSF must be installed before integrating your MPI implementation with lsb_launch(). The lsb_launch() API requires the full set of liblsf.so, libbat.so (or liblsf.a, libbat.a).
- lsb_getalloc(): Allocates memory for a host list to be used for launching parallel tasks through blaunch and the lsb_launch() API. It is the responsibility of the caller to free the host list when it is no longer needed. On success, the host list is a list of strings. Before freeing the host list, the individual elements must be freed. An application using the lsb_getalloc() API is assumed to be part of an LSF job, and that LSB_MCPU_HOSTS is set in the environment.
The blaunch job environment
blaunch determines from the job environment what job it is running under, and what the allocation for the job is. These can be determined by examining the environment variables LSB_JOBID, LSB_JOBINDEX, and LSB_MCPU_HOSTS. If any of these variables do not exist, blaunch exits with a non-zero value. Similarly, if blaunch is used to start a task on a host not listed in LSB_MCPU_HOSTS, the command exits with a non-zero value.
The job submission script contains the blaunch command in place of rsh or ssh. The blaunch command does sanity checking of the environment to check for LSB_JOBID and LSB_MCPU_HOSTS. The blaunch command contacts the job RES to validate the information determined from the job environment. When the job RES receives the validation request from blaunch, it registers with the root sbatchd to handle signals for the job.
The job RES periodically requests resource usage for the remote tasks. This message also acts as a heartbeat for the job. If a resource usage request is not made within a certain period of time it is assumed the job is gone and that the remote tasks should be shut down. This timeout is configurable in an application profile in lsb.applications.
The blaunch command also honors the parameters LSB_CMD_LOG_MASK, LSB_DEBUG_CMD, and LSB_CMD_LOGDIR when defined in lsf.conf or as environment variables. The environment variables take precedence over the values in lsf.conf.
To ensure that no other users can run jobs on hosts allocated to tasks launched by blaunch set the LSF_DISABLE_LSRUN=Y parameter in the lsf.conf file. When the LSF_DISABLE_LSRUN=Y parameter is defined, RES refuses remote connections from lsrun and lsgrun unless the user is either an LSF administrator or root. The LSF_ROOT_REX parameter must be defined for remote execution by root. Other remote execution commands, such as ch and lsmake are not affected.
Job control actions defined in the JOB_CONTROLS parameter in the lsb.queues file only take effect on the first execution host. Job control actions defined in the queue do no affect tasks running on other hosts. If the JOB_CONTROLS parameter is defined, the default job control signals of LSF (SUSPEND, RESUME, TERMINATE) do not reach each task on each execution host.
Temporary directory for tasks launched by blaunch
By default, LSF creates a temporary directory for a job only on the first execution host. If the LSF_TMPDIR parameter is set in the lsf.conf file, the path of the job temporary directory on the first execution host is set to LSF_TMPDIR/job_ID.tmpdir.
If the LSB_SET_TMPDIR= Y parameter is set, the environment variable TMPDIR will be set equal to the path specified by LSF_TMPDIR. This value for TMPDIR overrides any value that might be set in the submission environment.
- When the environment variable TMPDIR is set on the first execution host, the blaunch framework propagates this environment variable to all execution hosts when launching remote tasks.
- The job RES or the task RES creates the directory specified by TMPDIR if it does not already exist before starting the job.
- The directory created by the job RES or task RES has permission 0700 and is owned by the execution user.
- If the TMPDIR directory was created by the task RES, LSF deletes the temporary directory and its contents when the task is complete.
- If the TMPDIR directory was created by the job RES, LSF will delete the temporary directory and its contents when the job is done.
- If the TMPDIR directory is on a shared file system, it is assumed to be shared by all the hosts allocated to the blaunch job, so LSF does not remove TMPDIR directories created by the job RES or task RES.
Automatic generation of the job host file
LSF automatically places the allocated hosts for a job into the $LSB_HOSTS and $LSB_MCPU_HOSTS environment variables. Since most MPI implementations and parallel applications expect to read the allocated hosts from a file, LSF creates a host file in the default job output directory $HOME/.lsbatch on the execution host before the job runs, and deletes it after the job has finished running. The name of the host file created has the format:
.lsb.<jobid>.hostfile
- hostA
- hostA
- hostB
- hostB
- hostC
LSF publishes the full path to the host file by setting the environment variable LSB_DJOB_HOSTFILE.
Handle remote task exit
- When task exits with zero value, LSF does nothing.
- When task exits with non-zero value, LSF does nothing.
- When task crashes, LSF shuts down the entire job.
The RTASK_GONE_ACTION parameter has the following syntax:
RTASK_GONE_ACTION="[KILLJOB_TASKDONE | KILLJOB_TASKEXIT]
[IGNORE_TASKCRASH]"
Where:
- The IGNORE_TASKCRASH parameter: A remote task crashes. LSF does nothing. The job continues to launch the next task.
- The KILLJOB_TASKDONE parameter: A remote task exits with zero value. LSF terminates all tasks in the job.
- The KILLJOB_TASKEXIT parameter: A remote task exits with non-zero value. LSF terminates all tasks in the job.
For example:
RTASK_GONE_ACTION="IGNORE_TASKCRASH KILLJOB_TASKEXIT"
The RTASK_GONE_ACTION parameter only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_RTASK_GONE_ACTION variable is set when running bsub -app for the specified application. You can also use the environment variable LSB_DJOB_RTASK_GONE_ACTION to override the value set in the application profile.
The RTASK_GONE_ACTION=IGNORE_TASKCRASH parameter has no effect on PE jobs: When a user application is killed, POE triggers the job to quit.
Handling communication failure
By default, LSF shuts down the entire job if connection is lost with the task RES, validation timeout, or heartbeat timeout. You can configure an application profile in lsb.applications so only the current tasks are shut down, not the entire job.
Use the DJOB_COMMFAIL_ACTION="KILL_TASKS" parameter to define the behavior of LSF when it detects a communication failure between itself and one or more tasks. If not defined, LSF terminates all tasks, and shuts down the job. If set to KILL_TASKS, LSF tries to kill all the current tasks of a parallel or distributed job associated with the communication failure.
The DJOB_COMMFAIL_ACTION parameter only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_COMMFAIL_ACTION environment variable is set when running bsub -app for the specified application.
Set up job launching environment
LSF can run an appropriate script that is responsible for setup and cleanup of the job launching environment. You can specify the name of the appropriate script in an application profile in lsb.applications.
Use the DJOB_ENV_SCRIPT parameter to define the path to a script that sets the environment for the parallel or distributed job launcher. The script runs as the user, and is part of the job. The DJOB_ENV_SCRIPT parameter only applies to the blaunch distributed application framework. If a full path is specified, LSF uses the path name for the execution. If a full path is not specified, LSF looks for it in LSF_BINDIR.
The specified script must support a setup argument and a cleanup argument. LSF invokes the script with the setup argument before launching the actual job to set up the environment, and with cleanup argument after the job is finished.
LSF assumes that if setup cannot be performed, the environment to run the job does not exist. If the script returns a non-zero value at setup, an error is printed to stderr of the job, and the job exits. Regardless of the return value of the script at cleanup, the real job exit value is used. If the return value of the script is non-zero, an error message is printed to stderr of the job.
When defined in an application profile, the LSB_DJOB_ENV_SCRIPT variable is set when running bsub -app for the specified application. For example, if DJOB_ENV_SCRIPT=mpich.script, LSF runs the $LSF_BINDIR/mpich.script setup script to set up the environment to run an MPICH job. After the job completes, LSF runs the $LSF_BINDIR/mpich.script script for cleanup
On cleanup, the mpich.script file could, for example, remove any temporary files and release resources used by the job. Changes to the LSB_DJOB_ENV_SCRIPT environment variable made by the script are visible to the job.
Update job heartbeat and resource usage
Use the DJOB_HB_INTERVAL parameter in an application profile in lsb.applications to configure an interval in seconds used to update the heartbeat between LSF and the tasks of a parallel or distributed job. The DJOB_HB_INTERVAL parameter only applies to the blaunch distributed application framework. When the DJOB_HB_INTERVAL parameter is specified, the interval is scaled according to the number of tasks in the job:
max(DJOB_HB_INTERVAL, 10) + host_factor
where host_factor = 0.01 * number of hosts allocated for the job.
When defined in an application profile, the LSB_DJOB_HB_INTERVAL variable is set in the parallel or distributed job environment. You should not manually change the value of LSB_DJOB_HB_INTERVAL.
By default, the interval is equal to the SBD_SLEEP_TIME parameter in the lsb.params file, where the default value of SBD_SLEEP_TIME is 30 seconds.
How blaunch supports task geometry and process group files
The current support for task geometry in LSF requires the user submitting a job to specify the wanted task geometry by setting the environment variable LSB_TASK_GEOMETRY in their submission environment before job submission. LSF checks for LSB_TASK_GEOMETRY and modifies LSB_MCPU_HOSTS appropriately.
The environment variable LSB_TASK_GEOMETRY is checked for all parallel jobs. If LSB_TASK_GEOMETRY is set users submit a parallel job (a job that requests more than 1 slot), LSF attempts to shape LSB_MCPU_HOSTS accordingly.
The LSB_TASK_GEOMETRY variable was introduced to replace the LSB_PJL_TASK_GEOMETRY variable, which is kept for compatibility with earlier versions. However, task geometry does not work using blaunch alone; it works with the PE/blaunch integration.
Resource collection for all commands in a job script
Parallel and distributed jobs are typically launched with a job script. If your job script runs multiple commands, you can ensure that resource usage is collected correctly for all commands in a job script by configuring the LSF_HPC_EXTENSIONS=CUMULATIVE_RUSAGE parameter in the lsf.conf file. Resource usage is collected for jobs in the job script, rather than being overwritten when each command is executed.
Resizable jobs and blaunch
Because a resizable job can be resized any time, the blaunch framework is aware of the newly added resources (hosts) or released resources. When a validation request comes with those additional resources, the blaunch framework accepts the request and launches the remote tasks accordingly. When part of an allocation is released, the blaunch framework makes sure no remote tasks are running on those released resources, by terminating remote tasks on the released hosts if any. Any further validation requests with those released resources are rejected.
- The blaunch command and lsb_getalloc() API call accesses up to date resource allocation through the LSB_DJOB_HOSTFILE environment variable
- Validation request (to launch remote tasks) with the additional resources succeeds.
- Validation request (to launch remote tasks) with the released resources fails.
- Remote tasks on the released resources are terminated and the blaunch framework terminates tasks on a host when the host has been completely removed from the allocation.
- When releasing resources, LSF allows a configurable grace period (the DJOB_RESIZE_ GRACE_PERIOD parameter in the lsb.applications file) for tasks to clean up and exit themselves. By default, there is no grace period.
- When remote tasks are launched on new additional hosts but the notification command fails, those remote tasks are terminated.
Submitting jobs with blaunch
Use bsub to call blaunch, or to invoke an execution script that calls blaunch. The blaunch command assumes that bsub -n implies one task per job slot.
- Submit a job:
bsub -n 4 blaunch myjob
- Submit a job to launch tasks on a specific host:
bsub -n 4 blaunch hostA myjob
- Submit a job with a host list:
bsub -n 4 blaunch -z "hostA hostB" myjob
- Submit a job with a host file:
bsub -n 4 blaunch -u ./hostfile myjob
- Submit a job to an application profile
bsub -n 4 -app djob blaunch myjob
Launching ANSYS jobs
To launch an ANSYS job through LSF using the blaunch framework, substitute the path to rsh or ssh with the path to blaunch. For example:
#BSUB -o stdout.txt
#BSUB -e stderr.txt
# Note: This case statement should be used to set up any
# environment variables needed to run the different versions
# of Ansys. All versions in this case statement that have the
# string "version list entry" on the same line will appear as
# choices in the Ansys service submission page.
case $VERSION in
10.0) #version list entry
export ANSYS_DIR=/usr/share/app/ansys_inc/v100/Ansys
export ANSYSLMD_LICENSE_FILE=1051@licserver.company.com
export MPI_REMSH=/opt/lsf/bin/blaunch
program=${ANSYS_DIR}/bin/ansys100
;;
*)
echo "Invalid version ($VERSION) specified"
exit 1
;;
esac
if [ -z "$JOBNAME" ]; then
export JOBNAME=ANSYS-$$
fi
if [ $CPUS -eq 1 ]; then
${program} -p ansys -j $JOBNAME -s read -l en-us -b -i $INPUT $OPTS
else
if [ $MEMORY_ARCH = "Distributed" ] Then
HOSTLIST=`echo $LSB_HOSTS | sed s/" "/":1:"/g` ${program} -j $JOBNAME - p
ansys -pp -dis -machines \
${HOSTLIST}:1 -i $INPUT $OPTS
else
${program} -j $JOBNAME -p ansys -pp -dis -np $CPUS \
-i $INPUT $OPTS
fi
fi
blaunch parameters
The blaunch application framework uses the following parameters:
- LSF_RES_ALIVE_TIMEOUT
- LSF_DJOB_TASK_REG_WAIT_TIME
- LSB_FANOUT_TIMEOUT_PER_LAYER
- LSB_TASK_GEOMETRY
This parameter replaces the LSB_PJL_TASK_GEOMETRY parameter.
For details on these parameters, see the IBM® Spectrum LSF Configuration Reference.