IBM Support

Why MPI application still uses ssh from the compute node after enabling rsh?

Troubleshooting


Problem

Why MPI application still uses ssh from the compute node after enabling rsh?

Resolving The Problem

Why do MPI applications still use ssh from the compute node after enabling rsh?

Symptom: After enabling rsh for mpich, if the job is launched using the -nolocal option with mpirun, it still tries to use ssh to connect from the first compute node in the machines file. If the sshd service is turned off on the compute nodes, it will give a connection refused error even though rsh is enabled.

Explanation: If the -nolocal option is used with mpirun, it uses rsh (or ssh) from the node where it is launched to the first node in the hosts file. But from the first node, it will always use ssh to connect to other hosts. This is the default behaviour of mpich and not dependent on the contents of the mpirun and mpirun.ch_p4.args files on the node.

If the -nolocal option is not used, mpirun does an rsh to localhost and from there it again connects through rsh to the other hosts.

Solution: If rsh has to be used to connect from the first compute node to the others, the -nolocal option should not be used with mpirun.

 

[{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"4.4.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSZUCA","label":"IBM Spectrum Cluster Foundation"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":null,"Platform":[{"code":"","label":""}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

More support for:
IBM Spectrum Cluster Foundation

Software version:
4.4.0

Document number:
702023

Modified date:
09 September 2018

UID

isg3T1014121