Troubleshooting
Problem
Why MPI application still uses ssh from the compute node after enabling rsh?
Resolving The Problem
Why do MPI applications still use ssh from the compute node after enabling rsh?
Symptom: After enabling rsh for mpich, if the job is launched using the -nolocal option with mpirun, it still tries to use ssh to connect from the first compute node in the machines file. If the sshd service is turned off on the compute nodes, it will give a connection refused error even though rsh is enabled.
Explanation: If the -nolocal option is used with mpirun, it uses rsh (or ssh) from the node where it is launched to the first node in the hosts file. But from the first node, it will always use ssh to connect to other hosts. This is the default behaviour of mpich and not dependent on the contents of the mpirun and mpirun.ch_p4.args files on the node.
If the -nolocal option is not used, mpirun does an rsh to localhost and from there it again connects through rsh to the other hosts.
Solution: If rsh has to be used to connect from the first compute node to the others, the -nolocal option should not be used with mpirun.
Was this topic helpful?
Document Information
More support for:
IBM Spectrum Cluster Foundation
Software version:
4.4.0
Document number:
702023
Modified date:
09 September 2018
UID
isg3T1014121