Troubleshooting
Problem
Why doesn't Mpirun execute after being interrupted ?
Resolving The Problem
Why doesn't Mpirun execute after being interrupted ?
If you launch an mpi job as a normal user, then interrupt it with with “^c” (CTRL-C), the process terminates. If you launch the same mpi job, then interrupt with multiple “^c”, subsequent launches of the same job fail.
For example:
$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl
^c
$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl
^c^c^c
$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl
rm_4384: p4_error: semget failed for setnum: 0
p0_27477: (0.439618) net_recv failed for fd = 6
p0_27477: p4_error: net_recv read, errno = : 104
Killed by signal 2.
Killed by signal 2.
/opt/mpich/gnu/bin/mpirun: line 1: 27477 Broken pipe /opt/hpl/gnu/bin/xhpl
-p4pg /home/user001/PI27339 -p4wd /home/user001
Solution: Reboot the effected compute nodes (specified in the machinefile file).
Was this topic helpful?
Document Information
Modified date:
05 September 2018
UID
isg3T1014303