IBM Support

Using IntelMPI under LSF quick guide

Question & Answer


Question

How to integrate IntelMPI with LSF?

Answer

1. To submit an Intel MPI job, create a submission script:
 
$ cat intelmpi_job_script
#!/bin/sh
#BSUB -n 4
#BSUB -e intelmpi_%J.err
#BSUB -o intelmpi_%J.out
#BSUB -R "span[ptile=2]"
export INTELMPI_TOP=/../impi/5.1.0.079/intel64   (Path to where you installed intelMPI)
export PATH=$INTELMPI_TOP/bin:$PATH
export I_MPI_HYDRA_BOOTSTRAP=lsf   (sets the Intel MPI bootstrap server. lsf means to use LSF blaunch.)
export I_MPI_HYDRA_BRANCH_COUNT=2 (2 is number of hosts. i.e. 4/2=2)
export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1  (Set to 1 to let Intel MPI use blaunch –z to launch tasks.)
mpiexec.hydra sleep 9999
 
2.  $ bsub < ./intelmpi_job_script
      
$  bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
724741 user1 RUN   normal     host1 host1 *leep 9999 Feb 24 15:26
                                                                    host 1
                                                                    host 2
                                                                    host 2
 
When check with ps command, you will find below process running:
 

(On head node) $ ps -ef  --forest | grep user1


root      9403     1  0 Feb23 ?        00:00:00 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res
root     12675  9403  0 15:22 ?        00:00:00  \_ ../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
user1 30840  9403  0 19:28 ?        00:00:00  \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res
user1   30841 30840  0 19:28 ?        00:00:00      \_ /../impi/5.1.0.079/intel64/bin/pmi_proxy --control-port host1:34142 --pmi-connect lazy-cache --pmi-aggregate --branch-count 2 -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1126637480 --usize -2 --proxy-id -1
user1   30845 30841  0 19:28 ?        00:00:00          \_ sleep 9999
user1   30846 30841  0 19:28 ?        00:00:00          \_ sleep 9999
root      9421     1  0 Feb23 ?        00:00:03 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/sbatchd
root      2923  9421  0 15:16 ?        00:00:05  \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d
user1    2962  2923  0 15:16 ?        00:00:11  |   \_ ../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/mbschd
root      8665  2923  0 15:19 ?        00:00:00  |   \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
user1   30826  9421  0 19:28 ?        00:00:00  \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/res -d /../lsf9.1.1/conf -m host1 /home/user1/.lsbatch/1456313305.724751
user1   30830 30826  0 19:28 ?        00:00:00      \_ /bin/sh /home/user1/.lsbatch/1456313305.724751
user1   30834 30830  0 19:28 ?        00:00:00          \_ /bin/sh /home/user1/.lsbatch/1456313305.724751.shell
user1   30835 30834  0 19:28 ?        00:00:00              \_ mpiexec.hydra sleep 9999
user1   30836 30835  0 19:28 ?        00:00:00                  \_ /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/bin/blaunch -z host1 host2 /../impi/5.1.0.079/intel64/bin/pmi_proxy --control-port host1:34142 --pmi-connect lazy-cache --pmi-aggregate --branch-count 2 -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1126637480 --usize -2 --proxy-id -1
user1   30838     1 99 19:28 ?        00:01:23 /../lsf9.1.1/9.1/linux2.6-glibc2.3-x86_64/etc/nios 12 6

 


If we set I_MPI_LSF_USE_COLLECTIVE_LAUNCH other than 1, process tree would be different.

[{"Product":{"code":"SSETD4","label":"Platform LSF"},"Business Unit":{"code":"BU010","label":"Systems - Storage"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"9.1.1;9.1.2;9.1.3","Edition":"Standard"},{"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Business Unit":{"code":"BU010","label":"Systems - Storage"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":""}]

Document Information

Modified date:
17 June 2018

UID

isg3T1023404