blaunch distributed application framework

Most MPI implementations and many distributed applications use rsh and ssh as their task launching mechanism. The blaunch command provides a drop-in replacement for rsh and ssh as a transparent method for launching parallel and distributed applications within LSF.

About the blaunch command

The following figure illustrates blaunch processing:

Similar to the LSF lsrun command, blaunch transparently connects directly to the RES and sbatchd on the remote host, and subsequently creates and tracks the remote tasks, and provides the connection back to LSF. You do not need to insert pam or taskstarter into the rsh or ssh calling sequence, or configure any wrapper scripts.

blaunch supports the following core command line options as rsh and ssh:
  • rsh host_name command
  • ssh host_name command

The host name value for rsh and ssh can only be a single host name, so you can use the -z option to specify a space-delimited list of hosts where tasks are started in parallel. All other rsh and ssh options are silently ignored.

You cannot run the blaunch command directly from the command line as a standalone command. blaunch only works within an LSF job; it can only be used to launch tasks on remote hosts that are part of a job allocation. On success, blaunch exits with 0.
Restriction: You cannot run concurrent blaunch commands in background mode.
blaunch is supported on Windows 2000 or later with the following exceptions:
  • Only the following signals are supported: SIGKILL, SIGSTOP, SIGCONT.
  • The -n option is not supported.
  • CMD.EXE /C <user command line> is used as intermediate command shell when -no-shell is not specified
  • CMD.EXE /C is not used when -no-shell is specified.
  • Windows User Account Control must be configured correctly to run jobs.

LSF APIs for the blaunch distributed application framework

LSF provides the following APIs for programming your own applications to use the blaunch distributed application framework:
  • lsb_launch(): Synchronous API call to allow source level integration with vendor MPI implementations. This API launches the specified command (argv) on the remote nodes in parallel. LSF must be installed before integrating your MPI implementation with lsb_launch(). The lsb_launch() API requires the full set of liblsf.so, libbat.so (or liblsf.a, libbat.a).
  • lsb_getalloc(): Allocates memory for a host list to be used for launching parallel tasks through blaunch and the lsb_launch() API. It is the responsibility of the caller to free the host list when it is no longer needed. On success, the host list is a list of strings. Before freeing the host list, the individual elements must be freed. An application using the lsb_getalloc() API is assumed to be part of an LSF job, and that LSB_MCPU_HOSTS is set in the environment.

The blaunch job environment

blaunch determines from the job environment what job it is running under, and what the allocation for the job is. These can be determined by examining the environment variables LSB_JOBID, LSB_JOBINDEX, and LSB_MCPU_HOSTS. If any of these variables do not exist, blaunch exits with a non-zero value. Similarly, if blaunch is used to start a task on a host not listed in LSB_MCPU_HOSTS, the command exits with a non-zero value.

The job submission script contains the blaunch command in place of rsh or ssh. The blaunch command does sanity checking of the environment to check for LSB_JOBID and LSB_MCPU_HOSTS. The blaunch command contacts the job RES to validate the information determined from the job environment. When the job RES receives the validation request from blaunch, it registers with the root sbatchd to handle signals for the job.

The job RES periodically requests resource usage for the remote tasks. This message also acts as a heartbeat for the job. If a resource usage request is not made within a certain period of time it is assumed the job is gone and that the remote tasks should be shut down. This timeout is configurable in an application profile in lsb.applications.

The blaunch command also honors the parameters LSB_CMD_LOG_MASK, LSB_DEBUG_CMD, and LSB_CMD_LOGDIR when defined in lsf.conf or as environment variables. The environment variables take precedence over the values in lsf.conf.

To ensure that no other users can run jobs on hosts allocated to tasks launched by blaunch set the LSF_DISABLE_LSRUN=Y parameter in the lsf.conf file. When the LSF_DISABLE_LSRUN=Y parameter is defined, RES refuses remote connections from lsrun and lsgrun unless the user is either an LSF administrator or root. The LSF_ROOT_REX parameter must be defined for remote execution by root. Other remote execution commands, such as ch and lsmake are not affected.

Job control actions defined in the JOB_CONTROLS parameter in the lsb.queues file only take effect on the first execution host. Job control actions defined in the queue do no affect tasks running on other hosts. If the JOB_CONTROLS parameter is defined, the default job control signals of LSF (SUSPEND, RESUME, TERMINATE) do not reach each task on each execution host.

Temporary directory for tasks launched by blaunch

By default, LSF creates a temporary directory for a job only on the first execution host. If the LSF_TMPDIR parameter is set in the lsf.conf file, the path of the job temporary directory on the first execution host is set to LSF_TMPDIR/job_ID.tmpdir.

If the LSB_SET_TMPDIR= Y parameter is set, the environment variable TMPDIR will be set equal to the path specified by LSF_TMPDIR. This value for TMPDIR overrides any value that might be set in the submission environment.

Tasks launched through the blaunch distributed application framework make use of the LSF temporary directory specified by the LSF_TMPDIR parameter:
  • When the environment variable TMPDIR is set on the first execution host, the blaunch framework propagates this environment variable to all execution hosts when launching remote tasks.
  • The job RES or the task RES creates the directory specified by TMPDIR if it does not already exist before starting the job.
  • The directory created by the job RES or task RES has permission 0700 and is owned by the execution user.
  • If the TMPDIR directory was created by the task RES, LSF deletes the temporary directory and its contents when the task is complete.
  • If the TMPDIR directory was created by the job RES, LSF will delete the temporary directory and its contents when the job is done.
  • If the TMPDIR directory is on a shared file system, it is assumed to be shared by all the hosts allocated to the blaunch job, so LSF does not remove TMPDIR directories created by the job RES or task RES.

Automatic generation of the job host file

LSF automatically places the allocated hosts for a job into the $LSB_HOSTS and $LSB_MCPU_HOSTS environment variables. Since most MPI implementations and parallel applications expect to read the allocated hosts from a file, LSF creates a host file in the default job output directory $HOME/.lsbatch on the execution host before the job runs, and deletes it after the job has finished running. The name of the host file created has the format:

.lsb.<jobid>.hostfile

The host file contains one host per line. For example, if LSB_MCPU_HOSTS="hostA 2 hostB 2 hostC 1", the host file contains the following host names:
  • hostA
  • hostA
  • hostB
  • hostB
  • hostC

LSF publishes the full path to the host file by setting the environment variable LSB_DJOB_HOSTFILE.

Handle remote task exit

You can configure an application profile in lsb.applications to control the behavior of a parallel or distributed application when a remote task exits. Specify a value for the RTASK_GONE_ACTION parameter in the application profile to define what the LSF does when a remote task exits. The default behavior is as follows:
  • When task exits with zero value, LSF does nothing.
  • When task exits with non-zero value, LSF does nothing.
  • When task crashes, LSF shuts down the entire job.

The RTASK_GONE_ACTION parameter has the following syntax:

RTASK_GONE_ACTION="[KILLJOB_TASKDONE | KILLJOB_TASKEXIT]

[IGNORE_TASKCRASH]"

Where:

  • The IGNORE_TASKCRASH parameter: A remote task crashes. LSF does nothing. The job continues to launch the next task.
  • The KILLJOB_TASKDONE parameter: A remote task exits with zero value. LSF terminates all tasks in the job.
  • The KILLJOB_TASKEXIT parameter: A remote task exits with non-zero value. LSF terminates all tasks in the job.

For example:

RTASK_GONE_ACTION="IGNORE_TASKCRASH KILLJOB_TASKEXIT"

The RTASK_GONE_ACTION parameter only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_RTASK_GONE_ACTION variable is set when running bsub -app for the specified application. You can also use the environment variable LSB_DJOB_RTASK_GONE_ACTION to override the value set in the application profile.

The RTASK_GONE_ACTION=IGNORE_TASKCRASH parameter has no effect on PE jobs: When a user application is killed, POE triggers the job to quit.

Handling communication failure

By default, LSF shuts down the entire job if connection is lost with the task RES, validation timeout, or heartbeat timeout. You can configure an application profile in lsb.applications so only the current tasks are shut down, not the entire job.

Use the DJOB_COMMFAIL_ACTION="KILL_TASKS" parameter to define the behavior of LSF when it detects a communication failure between itself and one or more tasks. If not defined, LSF terminates all tasks, and shuts down the job. If set to KILL_TASKS, LSF tries to kill all the current tasks of a parallel or distributed job associated with the communication failure.

The DJOB_COMMFAIL_ACTION parameter only applies to the blaunch distributed application framework. When defined in an application profile, the LSB_DJOB_COMMFAIL_ACTION environment variable is set when running bsub -app for the specified application.

Set up job launching environment

LSF can run an appropriate script that is responsible for setup and cleanup of the job launching environment. You can specify the name of the appropriate script in an application profile in lsb.applications.

Use the DJOB_ENV_SCRIPT parameter to define the path to a script that sets the environment for the parallel or distributed job launcher. The script runs as the user, and is part of the job. The DJOB_ENV_SCRIPT parameter only applies to the blaunch distributed application framework. If a full path is specified, LSF uses the path name for the execution. If a full path is not specified, LSF looks for it in LSF_BINDIR.

The specified script must support a setup argument and a cleanup argument. LSF invokes the script with the setup argument before launching the actual job to set up the environment, and with cleanup argument after the job is finished.

LSF assumes that if setup cannot be performed, the environment to run the job does not exist. If the script returns a non-zero value at setup, an error is printed to stderr of the job, and the job exits. Regardless of the return value of the script at cleanup, the real job exit value is used. If the return value of the script is non-zero, an error message is printed to stderr of the job.

When defined in an application profile, the LSB_DJOB_ENV_SCRIPT variable is set when running bsub -app for the specified application. For example, if DJOB_ENV_SCRIPT=mpich.script, LSF runs the $LSF_BINDIR/mpich.script setup script to set up the environment to run an MPICH job. After the job completes, LSF runs the $LSF_BINDIR/mpich.script script for cleanup

On cleanup, the mpich.script file could, for example, remove any temporary files and release resources used by the job. Changes to the LSB_DJOB_ENV_SCRIPT environment variable made by the script are visible to the job.

Update job heartbeat and resource usage

Use the DJOB_HB_INTERVAL parameter in an application profile in lsb.applications to configure an interval in seconds used to update the heartbeat between LSF and the tasks of a parallel or distributed job. The DJOB_HB_INTERVAL parameter only applies to the blaunch distributed application framework. When the DJOB_HB_INTERVAL parameter is specified, the interval is scaled according to the number of tasks in the job:

max(DJOB_HB_INTERVAL, 10) + host_factor

where host_factor = 0.01 * number of hosts allocated for the job.

When defined in an application profile, the LSB_DJOB_HB_INTERVAL variable is set in the parallel or distributed job environment. You should not manually change the value of LSB_DJOB_HB_INTERVAL.

By default, the interval is equal to the SBD_SLEEP_TIME parameter in the lsb.params file, where the default value of SBD_SLEEP_TIME is 30 seconds.

How blaunch supports task geometry and process group files

The current support for task geometry in LSF requires the user submitting a job to specify the wanted task geometry by setting the environment variable LSB_TASK_GEOMETRY in their submission environment before job submission. LSF checks for LSB_TASK_GEOMETRY and modifies LSB_MCPU_HOSTS appropriately.

The environment variable LSB_TASK_GEOMETRY is checked for all parallel jobs. If LSB_TASK_GEOMETRY is set users submit a parallel job (a job that requests more than 1 slot), LSF attempts to shape LSB_MCPU_HOSTS accordingly.

The LSB_TASK_GEOMETRY variable was introduced to replace the LSB_PJL_TASK_GEOMETRY variable, which is kept for compatibility with earlier versions. However, task geometry does not work using blaunch alone; it works with the PE/blaunch integration.

Resource collection for all commands in a job script

Parallel and distributed jobs are typically launched with a job script. If your job script runs multiple commands, you can ensure that resource usage is collected correctly for all commands in a job script by configuring the LSF_HPC_EXTENSIONS=CUMULATIVE_RUSAGE parameter in the lsf.conf file. Resource usage is collected for jobs in the job script, rather than being overwritten when each command is executed.

Resizable jobs and blaunch

Because a resizable job can be resized any time, the blaunch framework is aware of the newly added resources (hosts) or released resources. When a validation request comes with those additional resources, the blaunch framework accepts the request and launches the remote tasks accordingly. When part of an allocation is released, the blaunch framework makes sure no remote tasks are running on those released resources, by terminating remote tasks on the released hosts if any. Any further validation requests with those released resources are rejected.

The blaunch framework provides the following functionality for resizable jobs:
  • The blaunch command and lsb_getalloc() API call accesses up to date resource allocation through the LSB_DJOB_HOSTFILE environment variable
  • Validation request (to launch remote tasks) with the additional resources succeeds.
  • Validation request (to launch remote tasks) with the released resources fails.
  • Remote tasks on the released resources are terminated and the blaunch framework terminates tasks on a host when the host has been completely removed from the allocation.
  • When releasing resources, LSF allows a configurable grace period (the DJOB_RESIZE_ GRACE_PERIOD parameter in the lsb.applications file) for tasks to clean up and exit themselves. By default, there is no grace period.
  • When remote tasks are launched on new additional hosts but the notification command fails, those remote tasks are terminated.
Note: Automatic job resizing is a signaling mechanism only. It does not expand the extent of the original job launched with blaunch. The resize notification script is required along with a signal listening script. The signal listening script runs additional blaunch commands on notification to allocate the resized resources to make them available to the job tasks. For help creating signal listening and notification scripts, contact IBM Support.

Submitting jobs with blaunch

Use bsub to call blaunch, or to invoke an execution script that calls blaunch. The blaunch command assumes that bsub -n implies one task per job slot.

  • Submit a job:

    bsub -n 4 blaunch myjob

  • Submit a job to launch tasks on a specific host:

    bsub -n 4 blaunch hostA myjob

  • Submit a job with a host list:

    bsub -n 4 blaunch -z "hostA hostB" myjob

  • Submit a job with a host file:

    bsub -n 4 blaunch -u ./hostfile myjob

  • Submit a job to an application profile

    bsub -n 4 -app djob blaunch myjob

Launching ANSYS jobs

To launch an ANSYS job through LSF using the blaunch framework, substitute the path to rsh or ssh with the path to blaunch. For example:

#BSUB -o stdout.txt
#BSUB -e stderr.txt
# Note: This case statement should be used to set up any
# environment variables needed to run the different versions
# of Ansys. All versions in this case statement that have the
# string "version list entry" on the same line will appear as
# choices in the Ansys service submission page.
 
case $VERSION in
 10.0)  #version list entry
        export ANSYS_DIR=/usr/share/app/ansys_inc/v100/Ansys
        export ANSYSLMD_LICENSE_FILE=1051@licserver.company.com
       export MPI_REMSH=/opt/lsf/bin/blaunch
        program=${ANSYS_DIR}/bin/ansys100
        ;;
  *)
        echo "Invalid version ($VERSION) specified"
        exit 1
        ;;
esac
 
if [ -z "$JOBNAME" ]; then
    export JOBNAME=ANSYS-$$
fi
 
if [ $CPUS -eq 1 ]; then
    ${program} -p ansys -j $JOBNAME -s read -l en-us -b -i $INPUT $OPTS
else
    if [ $MEMORY_ARCH = "Distributed" ] Then
      HOSTLIST=`echo $LSB_HOSTS | sed s/" "/":1:"/g` ${program} -j $JOBNAME - p
ansys -pp -dis -machines \
    ${HOSTLIST}:1 -i $INPUT $OPTS
    else
       ${program} -j $JOBNAME -p ansys -pp -dis -np $CPUS \
    -i $INPUT $OPTS
    fi
fi

blaunch parameters

The blaunch application framework uses the following parameters:

  • LSF_RES_ALIVE_TIMEOUT
  • LSF_DJOB_TASK_REG_WAIT_TIME
  • LSB_FANOUT_TIMEOUT_PER_LAYER
  • LSB_TASK_GEOMETRY

    This parameter replaces the LSB_PJL_TASK_GEOMETRY parameter.

For details on these parameters, see the IBM® Spectrum LSF Configuration Reference.