Troubleshooting
Problem
The following error message appears when running MPI jobs using MVAPICH: [user@hpcinstaller ~]$ mpirun -np 1 -machinefile machinefile ./hellof Abort signaled by rank 0: No ACTIVE ports found MPI process terminated unexpectedly Exit code -5 signaled from compute-00-00 Killing remote processes...DONE [user@hpcinstaller ~]$ Signal 15 received.
Resolving The Problem
MPI jobs compiled with MVAPICH from Platform HPC kit can only be used with InfiniBand as interconnection.
Verify your InfiniBand network to ensure that there are no issues with it before rerunning your job.
To diagnose your InfiniBand network quickly, use the following two commands:
The output should show you all your IB devices, including nodes and switches, and you should not see any bad nodes or ports. Also, you should not have any ports with errors beyond thresholds.
Example:
[root@compute-00-00 ~]# ibchecknet
# Checking Ca: nodeguid 0x0002c902002789ac
# Checking Ca: nodeguid 0x0002c9030002847c
## Summary: 3 nodes checked, 0 bad nodes found
## 4 ports checked, 0 bad ports found
## 0 ports have errors beyond threshold
The output is similar to above command, but more concise. You should not see any bad nodes or ports.
Example:
[root@compute-00-00 ~]# ibcheckstate ## Summary: 3 nodes checked, 0 bad nodes found ## 4 ports checked, 0 ports with bad state found
.
Was this topic helpful?
Document Information
Modified date:
16 September 2018
UID
isg3T1016282