IBM Support

Diagnosing the error message "ls_rsetenv: A connect sys call failed: Connection refused"

Troubleshooting


Problem

Interactive jobs fail with "ls_rsetenv: A connect sys call failed: Connection refused"

Symptom

Interactive jobs (lsrun, lsgrun, bsub -I) fail with the following error for specific hosts:

ls_rsetenv: A connect sys call failed: Connection refused

Cause

As per the error message, “lsrun” failed probably because of connection failure between submission host and execution host. It could be caused by communication failure between both hosts, or RES daemon not running on execution host, or RES not listening on the designated port.

Diagnosing The Problem

Here is the checklist to diagnose the possible causes of the issue.
1. Ensure the following items are verified:
a. submission host can connect to execution host by host name (nslookup )
b. RES daemon is up and listerning on the port defined in lsf.conf (ps -ef | grep res)
2. If /etc/services or the NIS or NIS+ database are used to define daemon port number, to isolate the issue, define RES daemon port in lsf.conf which will automatically override the definition in other places.
3. Run strace lsrun -m servername csh which gives the detailed trace of the connection. See if any error message is logged.
a. If any, examine the trace log to find the condition that fails and throws the error.
b. If no error, however, it indicates wrong RES port is given by NIS or /etc/services and thus trigger connection failure. Look into the relevant settings. To avoid this mis-configuration, you can always define LSF daemon port in lsf.conf.

[{"Product":{"code":"SSETD4","label":"Platform LSF"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF016","label":"Linux"}],"Version":"7.0.5;7.0.6;8.0;8.3;9.1.0;9.1.1;9.1.2;9.1.3","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
17 June 2018

UID

isg3T1020458