Control LSF system daemons
Commands for starting, shutting down, restarting, and reconfiguring LSF system daemons.
Permissions required
- You must be logged on as root or as a user listed in the /etc/lsf.sudoers file.
- You must be able to run the rsh or ssh commands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring these commands. The shell command that is specified by the LSF_RSH parameter in the lsf.conf file is used before attempting to use the rsh command.
LSF system daemon commands
Daemon | Action | Command (Note that once you use systemctl commands, continue to use them instead of other control commands) | Permissions |
---|---|---|---|
All daemons in the cluster | Start |
|
Must be root or a user who is listed in the lsf.sudoers file for all these commands |
Shut down |
|
||
Restart | (Available starting in Fix Pack 14) systemctl restart lsfd | ||
sbatchd |
Start |
|
Must be root or a user who is listed in the lsf.sudoers file for the startup command |
Shut down |
|
Must be root or the LSF administrator for other commands | |
Restart |
|
||
mbatchd |
Shut down |
|
Must be root or the LSF administrator for these commands |
Restart | badmin mbdrestart | ||
Reconfigure | badmin reconfig | ||
RES |
Start |
|
Must be root or a user who is listed in the lsf.sudoers file for the startup command |
Shut down |
|
Must be the LSF administrator for other commands |
|
Restart |
|
||
LIM |
Start |
|
Must be root or a user who is listed in the lsf.sudoers file for the startup command |
Shut down |
|
Must be the LSF administrator for other commands |
|
Restart |
|
||
Restart all hosts in the cluster | lsadmin reconfig |
sbatchd
daemon
Restarting the sbatchd daemon on a host does not affect jobs that are running on that host.
If the sbatchd daemon is shut down, the host is not available to run new jobs. Any existing jobs that are running on that host continue, but the results are not sent to the user until the sbatchd daemon is restarted.
LIM
and RES
daemons
Jobs running on the host are not affected by restarting the daemons.
If a daemon is not responding to network connections, the lsadmin command displays an error message with the host name. In this case, you must stop and restart the daemon manually.
If the load information manager (LIM) and the other daemons on the current management host are shut down, another host automatically takes over as the management host.
If resource execution server (RES) is shut down while remote interactive tasks are running on the host, the running tasks continue but no new tasks are accepted.
LSF daemons or binary files protected from operating system out-of-memory (OS OOM) killer
The following LSF daemons are protected from being stopped on systems that support out-of-memory (OOM) killer:
- root RES
- root LIM
- root sbatchd
- pim
- melim
- mbatchd
- rla
- mbschd
- krbrenewd
- elim
- lim -2 (root)
- mbatchd -2 (root)
For the preceding daemons, the oom_adj parameter is automatically set to -17 or the oom_score_adj parameter is set to -1000 when the daemons are started or restarted. This feature ensures that LSF daemons survive the OOM killer but not user jobs.
When the oom_adj or oom_score_adj parameters are set, the log messages are set to DEBUG level: Set oom_adj to -17. and Set oom_score_adj to -1000.
The root RES, root LIM, root sbatchd, pim, melim, and mbatchd daemons protect themselves actively and log messages.
All logs must set the LSF_LOG_MASK as LOG_DEBUG parameters.
- RES must be configured as LSF_DEBUG_RES="LC_TRACE"
- LIM must be configured as LSF_DEBUG_LIM="LC_TRACE"
When the enterprise grid orchestrator (EGO) is enabled, the EGO_LOG_MASK=LOG_DEBUG parameter must be set in the ego.conf file
- The sbatchd daemon must be configured as LSB_DEBUG_SBD="LC_TRACE"
- The pim daemon must be configured as LSF_DEBUG_PIM="LC_TRACE"
- The mbatchd daemon must be configured as LSB_DEBUG_MBD="LC_TRACE"