Troubleshooting
Problem
When running a OpenMPI job in LSF with affinities specified through LSF options. LSF makes the affinity decision and send to OpenMPI, in order for OpenMPI applications to follow the decision, the following procedure needs to be followed.
Resolving The Problem
If the procedure was not followed, two different OpenMPI jobs could end up using the same cores on a host while LSF scheduled them to use different cores. Below is an example.
bash-4.1$ cat job.sh
#!/bin/bash
#BSUB -n 4
#BSUB -R "affinity[core(1)]"
#BSUB -m "gordonc-3"
#BSUB -e %J.e
#BSUB -o %J.o
/usr/local/bin/mpirun --report-bindings ./a.out
bash-4.1$ bsub < job.sh ;bsub < job.sh
Job <4106> is submitted to default queue <normal>.
Job <4107> is submitted to default queue <normal>.
bash-4.1$ bjobs -aff -l # bjobs shows different cores are used for the 2 jobs
Job <4106>, User <gordonc>, Project <default>, Status <RUN>, Queue <normal>, Co
...
AFFINITY:
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
gordonc-3.gss.platf core - - /0/0/0 - - -
gordonc-3.gss.platf core - - /0/0/1 - - -
gordonc-3.gss.platf core - - /0/1/0 - - -
gordonc-3.gss.platf core - - /0/1/1 - - -
------------------------------------------------------------------------------
Job <4107>, User <gordonc>, Project <default>, Status <RUN>, Queue <normal>, Co
...
AFFINITY:
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
gordonc-3.gss.platf core - - /0/2/0 - - -
gordonc-3.gss.platf core - - /0/2/1 - - -
gordonc-3.gss.platf core - - /0/3/0 - - -
gordonc-3.gss.platf core - - /0/3/1 - - -
bash-4.1$ cat 4106.e # mpirun --report-bindings shows the same cores are used for the 2 jobs
[gordonc-3:22446] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.][./.][./.]
[gordonc-3:22446] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B][./.][./.]
[gordonc-3:22446] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.][./.][./.]
[gordonc-3:22446] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B][./.][./.][./.]
bash-4.1$ cat 4107.e
[gordonc-3:22499] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B][./.][./.][./.]
[gordonc-3:22499] MCW rank 2 bound to socket 1[core 2[hwt 0]]: [./.][B/.][./.][./.]
[gordonc-3:22499] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B][./.][./.]
[gordonc-3:22499] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.][./.][./.]
Following "Best practices Using Affinity Scheduling in IBM Platform LSF" (https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/99245193-fced-40e5-90df-a0e9f50a0fb0/page/359ab0d9-7849-4c6a-8cb8-7a62050b5222/attachment/7ba985b5-006f-4f06-bcfb-dafa92c4a713/media/Platform_BPG_Affinity.pdf) section "Configuring LSF to load the affinity scheduling plugin" and "Example 5: Automatically binding OpenMPI tasks"), LSF can generate rank file for OpenMPI and thus enforce the affinity.
$ cat $LSF_ENVDIR/lsbatch/gc101/configdir/lsb.applications
...
Begin Application
NAME = openmpi
DESCRIPTION = OpenMPI 1.8.3
DJOB_ENV_SCRIPT = openmpi_rankfile.sh
End Application
$ cat job.sh
#!/bin/bash
#BSUB -n 4
#BSUB -R "affinity[core(1)]"
#BSUB -m "gordonc-3"
#BSUB -app openmpi
/usr/local/bin/mpirun -rf $LSB_RANK_HOSTFILE ./a.out
$ bsub < job.sh ;bsub < job.sh
Job <4091> is submitted to default queue <normal>.
Job <4092> is submitted to default queue <normal>.
$ bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
4091 gordonc RUN normal gordonc-3.g gordonc-3.g *E ./a.out Sep 26 16:07
gordonc-3.gss.platformlab.ibm.com
gordonc-3.gss.platformlab.ibm.com
gordonc-3.gss.platformlab.ibm.com
4092 gordonc RUN normal gordonc-3.g gordonc-3.g *E ./a.out Sep 26 16:07
gordonc-3.gss.platformlab.ibm.com
gordonc-3.gss.platformlab.ibm.com
gordonc-3.gss.platformlab.ibm.com
$ ps -ef|grep a.out
gordonc 1164 1159 1 16:07 ? 00:00:00 /usr/local/bin/mpirun -rf /home/gordonc/.lsbatch/1506456435.4091.hostRankFile ./a.out
gordonc 1165 1158 0 16:07 ? 00:00:00 /usr/local/bin/mpirun -rf /home/gordonc/.lsbatch/1506456435.4092.hostRankFile ./a.out
gordonc 1169 1165 99 16:07 ? 00:00:05 ./a.out
gordonc 1170 1164 99 16:07 ? 00:00:05 ./a.out
gordonc 1171 1165 3 16:07 ? 00:00:00 ./a.out
gordonc 1172 1164 4 16:07 ? 00:00:00 ./a.out
gordonc 1173 1165 99 16:07 ? 00:00:05 ./a.out
gordonc 1176 1164 99 16:07 ? 00:00:05 ./a.out
gordonc 1177 1165 99 16:07 ? 00:00:05 ./a.out
gordonc 1184 1164 99 16:07 ? 00:00:05 ./a.out
gordonc 1209 32315 0 16:07 pts/0 00:00:00 grep a.out
$ for i in {1169,1170,1171,1172,1173,1176,1177,1184}; do taskset -p $i; done # use taskset to check real CPU affinity
pid 1169's current affinity mask: 10
pid 1170's current affinity mask: 1
pid 1171's current affinity mask: 20
pid 1172's current affinity mask: 2
pid 1173's current affinity mask: 40
pid 1176's current affinity mask: 4
pid 1177's current affinity mask: 80
pid 1184's current affinity mask: 8 # core 1,2,4,8 are for job 4091 and 10,20,40,80 are for job 4092
Was this topic helpful?
Document Information
Modified date:
17 June 2018
UID
isg3T1025826