IBM Support

Why doesn't Mpirun execute after being interrupted ?

Troubleshooting


Problem

Why doesn't Mpirun execute after being interrupted ?

Resolving The Problem

Why doesn't Mpirun execute after being interrupted ?

If you launch an mpi job as a normal user, then interrupt it with with “^c” (CTRL-C), the process terminates. If you launch the same mpi job, then interrupt with multiple “^c”, subsequent launches of the same job fail.

For example:

$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl

^c

$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl

^c^c^c

$ /opt/mpich/gnu/bin/mpirun -np 4 -machinefile machines /opt/hpl/gnu/bin/xhpl

rm_4384: p4_error: semget failed for setnum: 0

p0_27477: (0.439618) net_recv failed for fd = 6

p0_27477: p4_error: net_recv read, errno = : 104

Killed by signal 2.

Killed by signal 2.

/opt/mpich/gnu/bin/mpirun: line 1: 27477 Broken pipe /opt/hpl/gnu/bin/xhpl

-p4pg /home/user001/PI27339 -p4wd /home/user001

 

Solution: Reboot the effected compute nodes (specified in the machinefile file).

[{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.4.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
05 September 2018

UID

isg3T1014303