IBM Support

mbatchd error: jobStartError sbatchd on host <xxx> was unable to fork and job <nnn> rejected

Troubleshooting


Problem

The ulimit settings are OK. The bjobs command failed and it returned the error (MAX_CONCURRENT_QUERY is set to 100) :

LSF is processing your request. Please wait ...
LSF is processing your request. Please wait ...
Cannot connect to LSF. Please wait ...
LSF is down. Please wait ...
Batch system concurrent query limit exceeded ... retrying in 1 second(s).

 

Symptom

In sbatchd.log :
Dec 17 15:49:53 2018 10422 3 10.1 job_exec: Job <nnn> failed in fork(), Resource temporarily unavailable.
Dec 17 15:49:53 2018 10422 3 10.1 do_newjob: Job <nnn> failed in job_exec().
Dec 17 15:49:53 2018 10422 3 10.1 deallocJobCard(): Job <nnn> failed in fork(), Resource temporarily unavailable.

In mbatchd.log:
Dec 17 15:49:53 2018 18427 3 10.1 updateThreadHandler: pthread_create() failed, Resource temporarily unavailable.
Dec 17 15:49:53 2018 15291:15291 3 10.1 doSelectLoop(): Multithreaded child(nnn) died abnormally, restart a child
Dec 17 15:49:53 2018 15291:15291 3 10.1 createThreadedChild: myFork() failed, Resource temporarily unavailable.
Dec 17 15:49:53 2018 15291:15291 3 10.1 jobStartError: sbatchd on host <xxx> was unable to fork; job <nnn> rejected

In lim.log:
Dec 17 15:40:22 2018 10414 3 3.4.0 clientReq: fork() failed.
Dec 17 15:50:15 2018 10414 Last message repeated nnn time(s).

In /var/log/messages, the "Resource temporarily unavailable" message can be found related to sbatchd process in forking.

[{"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSWRJV","label":"IBM Spectrum LSF"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"LSF10.1 on SuSE Linux 12 SP2 and higher.","Edition":"","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]

Log InLog in to view more of this document

This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.

Document Information

Modified date:
18 January 2019

UID

ibm10795654