Question & Answer
Question
How to integrate IntelMPI with LSF?
Answer
1. To submit an Intel MPI job, create a submission script:
$ cat intelmpi_job_script
#!/bin/sh
#BSUB -n 4
#BSUB -e intelmpi_%J.err
#BSUB -o intelmpi_%J.out
#BSUB -R "span[ptile=2]"
export INTELMPI_TOP=/../impi/5.1.0.079/intel64 (Path to where you installed intelMPI)
export PATH=$INTELMPI_TOP/bin:$PATH
export I_MPI_HYDRA_BOOTSTRAP=lsf (sets the Intel MPI bootstrap server. lsf means to use LSF blaunch.)
export I_MPI_HYDRA_BRANCH_COUNT=2 (2 is number of hosts. i.e. 4/2=2)
export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1 (Set to 1 to let Intel MPI use blaunch –z to launch tasks.)
mpiexec.hydra sleep 9999
2. $ bsub < ./intelmpi_job_script
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
724741 user1 RUN normal host1 host1 *leep 9999 Feb 24 15:26
host 1
host 2
host 2
When check with ps command, you will find below process running:
(On head node) $ ps -ef --forest | grep user1
root 9403 1 0 Feb23 ? 00:00:00 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res
root 12675 9403 0 15:22 ? 00:00:00 \_ ../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
user1 30840 9403 0 19:28 ? 00:00:00 \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res
user1 30841 30840 0 19:28 ? 00:00:00 \_ /../impi/5.1.0.079/intel64/bin/pmi_proxy --control-port host1:34142 --pmi-connect lazy-cache --pmi-aggregate --branch-count 2 -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1126637480 --usize -2 --proxy-id -1
user1 30845 30841 0 19:28 ? 00:00:00 \_ sleep 9999
user1 30846 30841 0 19:28 ? 00:00:00 \_ sleep 9999
root 9421 1 0 Feb23 ? 00:00:03 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/sbatchd
root 2923 9421 0 15:16 ? 00:00:05 \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d
user1 2962 2923 0 15:16 ? 00:00:11 | \_ ../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/mbschd
root 8665 2923 0 15:19 ? 00:00:00 | \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
user1 30826 9421 0 19:28 ? 00:00:00 \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res -d /../lsf9.1.1/conf -m host1 /home/user1/.lsbatch/1456313305.724751
user1 30830 30826 0 19:28 ? 00:00:00 \_ /bin/sh /home/user1/.lsbatch/1456313305.724751
user1 30834 30830 0 19:28 ? 00:00:00 \_ /bin/sh /home/user1/.lsbatch/1456313305.724751.shell
user1 30835 30834 0 19:28 ? 00:00:00 \_ mpiexec.hydra sleep 9999
user1 30836 30835 0 19:28 ? 00:00:00 \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/bin/blaunch -z host1 host2 /../impi/5.1.0.079/intel64/bin/pmi_proxy --control-port host1:34142 --pmi-connect lazy-cache --pmi-aggregate --branch-count 2 -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1126637480 --usize -2 --proxy-id -1
user1 30838 1 99 19:28 ? 00:01:23 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/nios 12 6
If we set I_MPI_LSF_USE_COLLECTIVE_LAUNCH other than 1, process tree would be different.
Document Information
Modified date:
17 June 2018
UID
isg3T1023404