Troubleshooting
Problem
The ulimit settings are OK. The bjobs command failed and it returned the error (MAX_CONCURRENT_QUERY is set to 100) :
LSF is processing your request. Please wait ...
LSF is processing your request. Please wait ...
Cannot connect to LSF. Please wait ...
LSF is down. Please wait ...
Batch system concurrent query limit exceeded ... retrying in 1 second(s).
Symptom
In sbatchd.log :
Dec 17 15:49:53 2018 10422 3 10.1 job_exec: Job <nnn> failed in fork(), Resource temporarily unavailable.
Dec 17 15:49:53 2018 10422 3 10.1 do_newjob: Job <nnn> failed in job_exec().
Dec 17 15:49:53 2018 10422 3 10.1 deallocJobCard(): Job <nnn> failed in fork(), Resource temporarily unavailable.
Dec 17 15:49:53 2018 10422 3 10.1 job_exec: Job <nnn> failed in fork(), Resource temporarily unavailable.
Dec 17 15:49:53 2018 10422 3 10.1 do_newjob: Job <nnn> failed in job_exec().
Dec 17 15:49:53 2018 10422 3 10.1 deallocJobCard(): Job <nnn> failed in fork(), Resource temporarily unavailable.
In mbatchd.log:
Dec 17 15:49:53 2018 18427 3 10.1 updateThreadHandler: pthread_create() failed, Resource temporarily unavailable.
Dec 17 15:49:53 2018 15291:15291 3 10.1 doSelectLoop(): Multithreaded child(nnn) died abnormally, restart a child
Dec 17 15:49:53 2018 15291:15291 3 10.1 createThreadedChild: myFork() failed, Resource temporarily unavailable.
Dec 17 15:49:53 2018 15291:15291 3 10.1 jobStartError: sbatchd on host <xxx> was unable to fork; job <nnn> rejected
In lim.log:
Dec 17 15:40:22 2018 10414 3 3.4.0 clientReq: fork() failed.
Dec 17 15:50:15 2018 10414 Last message repeated nnn time(s).
In /var/log/messages, the "Resource temporarily unavailable" message can be found related to sbatchd process in forking.
[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"LSF10.1 on SuSE Linux 12 SP2 and higher.","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]
Log InLog in to view more of this document
This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.
Was this topic helpful?
Document Information
Modified date:
18 January 2019
UID
ibm10795654