pam

Parallel Application Manager – job starter for MPI applications

HP-UX vendor MPI syntax

bsub pam -mpi mpirun [mpirun_options] mpi_app [argument ...]

Generic PJL framework syntax

bsub pam [-t] [-v] [-n num_tasks] -g [num_args] pjl_wrapper [pjl_options] mpi_app [argument ...] pam [-h] pam [-V]

Description

The Parallel Application Manager (PAM) is fully integrated with LSF. PAM acts as the supervisor of a parallel LSF job.

MPI jobs started by the pam command can be submitted only through batch jobs, PAM cannot be used interactively to start parallel jobs. The sbatchd daemon starts PAM on the first execution host.

PAM has the following functionality for all parallel application processes (tasks):
  • Uses a vendor MPI library or an MPI Parallel Job Launcher (PJL), for example, mpirun or poe, to start a parallel job on a specified set of hosts in an LSF cluster.
  • PAM contacts RES on each execution host that is allocated to the parallel job.
  • PAM queries RES periodically to collect resource usage for each parallel task and passes control signals through RES to all process groups and individual running tasks, and cleans up tasks as needed.
  • Passes job-level resource usage and process IDs (PIDs and PGIDs) to sbatchd for enforcement
  • Collects resource usage information and exit status upon termination

Task startup for vendor MPI jobs

The pam command starts a vendor MPI job on a specified set of hosts in an LSF cluster. The pam command that starts an MPI job requires the underlying MPI system to be LSF-aware, using a vendor MPI implementation that supports LSF (for example, HP-UX vendor MPI).

PAM uses the vendor MPI library to create the child processes needed for the parallel tasks that make up your MPI application. It starts these tasks on the systems that are allocated by LSF. The allocation includes the number of execution hosts needed, and the number of child processes needed on each host.

Task startup for generic PJL jobs

For parallel jobs submitted with bsub:
  • PAM starts the PJL, which in turn starts the TaskStarter (TS).
  • TS starts the tasks on each execution host, reports the process ID to PAM, and waits for the task to finish.
Two environment variables enable PAM to run scripts or binary files before or after PAM is started. These variables are useful if you customize the mpirun.lsf script and have job scripts that call the mpirun.lsf script more than once.
$MPIRUN_LSF_PRE_EXEC
Runs before PAM is started.
$MPIRUN_LSF_POST_EXEC
Runs after PAM is started.

Options for vendor MPI jobs

-auto_place
The -auto_place option on the pam command line tells the IRIX mpirun library to start the MPI application according to the resources allocated by LSF.
-mpi
On HP-UX, you can have LSF manage the allocation of hosts to achieve better resource usage by coordinating the start-up phase with the mpirun command. Precede the regular MPI mpirun command with the following command:
bsub pam -mpi

For HP-UX vendor MPI jobs, the -mpi option must be the first option of the pam command.

For example, the following mpirun command runs a single-host job:
mpirun -np 14 a.out
To have LSF select the host, include the mpirun command in the bsub job submission command:
bsub pam -mpi mpirun -np 14 a.out
-n num_tasks
The number of processors that are required to run the parallel application, typically the same as the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.

You can use both the bsub -n and pam -n commands in the same job submission. The number that is specified in the pam -n option must be less than or equal to the number specified by the bsub -n command. If the number of tasks that are specified with the pam -n command is greater than the number that is specified by the bsub -n command, the pam -n command is ignored.

For example, you can specify the following command:
bsub -n 5 pam -n 2 -mpi -auto_place a.out

The job requests five processors, but PAM starts only two parallel tasks.

mpi_app [argument ...]
The name of the MPI application to be run on the listed hosts. This name must be the last argument on the command line.
-h
Prints command usage to stderr and exit.
-V
Prints LSF release version to stderr and exit.

Options for generic PJL jobs

-t
This option tells the pam command not to print the MPI job tasks summary report to the standard output. By default, the summary report prints the task ID, the host that it ran on, the command that was run, the exit status, and the termination time.
-v
Verbose mode. Displays the name of the execution host or hosts.
-g [num_args] pjl_wrapper [pjl_options]
The -g option is required to use the generic PJL framework. You must specify all the other pam options before -g.
num_args
Specifies how many space-separated arguments in the command line are related to the PJL (after that, the remaining section of the command line is assumed to be related to the binary application that starts the parallel tasks).
pjl_wrapper
The name of the PJL.
pjl_options
Optional arguments to the PJL.
For example:
  • A PJL named no_arg_pjl takes no options, so num_args=1. The syntax is:
    pam [pam_options] -g 1 no_arg_pjl job [job_options]
    
  • A PJL is named 3_arg_pjl and takes the options -a, -b, and group_name, so num_args=4. Use the following syntax:
    pam [pam_options] -g 4 3_arg_pjl -a -b group_name job [job_options]
    
-n num_tasks
The number of processors that are required to run the MPI application, typically the number of parallel tasks in the job. If the host is a multiprocessor, one host can start several tasks.

You can use both the bsub -n and pam -n commands in the same job submission. The number that is specified in the pam -n option must be less than or equal to the number specified by the bsub -n option. If the number of tasks that are specified with the pam -n option is greater than the number specified by the bsub -n option, the pam -n option is ignored.

mpi_app [argument ...]
The name of the MPI application to be run on the listed hosts. This name must be the last argument on the command line.
-h
Prints command usage to stderr and exit.
-V
Prints LSF release version to stderr and exits.

Exit Status

The pam command exits with the exit status of the mpirun command or the PJL wrapper.

See also

bsub