Troubleshooting
Problem
Why is my ssh connection from one compute node to another slow?
Resolving The Problem
Why is my ssh connection from one compute node to another slow?
Symptom: ssh from one compute node to another is slow in establishing a connection.
To test if this is the cause, perform the following:
1. Log in to a compute node and establish an ssh connection with another compute node using its host name:
# ssh another_compute_node_hostname
Take note of the time required.
2. Log out of the other compute node and try to establish a connection using its IP address:
# ssh another_compute_node_ipaddress
If the result is dramatically faster, the problem is with name resolution.
Explanation: The cluster may be trying to use a non-existent domain or DNS server.
To verify this problem is with name resolution edit the /etc/resolv.conf file on a compute node. Change the search line to only include the private domain.
Establish an ssh connection to another compute node. It should respond faster.
Solution: Fix this problem using any one of the following:
- Update the front end’s /etc/resolv.conf file to use a real DNS server.
- Update the database and set the PublicDNSDomain in the app_globals table to be blank. Remove the Kickstart cache by removing the file(s): /home/install/sbin/cache/ks.cache.*. Then reinstall the compute nodes.
- As a temporary fix, change the search parameter in the /etc/resolv.conf of all the compute nodes.
Was this topic helpful?
Document Information
More support for:
IBM Spectrum Cluster Foundation
Software version:
4.4.0
Document number:
702015
Modified date:
09 September 2018
UID
isg3T1014117