Control LSF system daemons

Commands for starting, shutting down, restarting, and reconfiguring LSF system daemons.

Permissions required

To control all daemons in the cluster, the following permissions are required:
  • You must be logged on as root or as a user listed in the /etc/lsf.sudoers file.
  • You must be able to run the rsh or ssh commands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring these commands. The shell command that is specified by the LSF_RSH parameter in the lsf.conf file is used before attempting to use the rsh command.

LSF system daemon commands

The following table lists an overview of commands that you use to control LSF daemons.
Table 1. Commands to control LSF daemons
Daemon Action Command (Note that once you use systemctl commands, continue to use them instead of other control commands) Permissions
All daemons in the cluster Start
  • lsfstartup
  • (Available starting in Fix Pack 14) systemctl start lsfd
Must be root or a user who is listed in the lsf.sudoers file for all these commands
  Shut down
  • lsfshutdown
  • (Available starting in Fix Pack 14) systemctl stop lsfd
 
  Restart (Available starting in Fix Pack 14) systemctl restart lsfd  
sbatchd Start
  • bctrld start sbd [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl start lsfd-sbd
Must be root or a user who is listed in the lsf.sudoers file for the startup command
  Shut down
  • bctrld stop sbd [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl stop lsfd-sbd
Must be root or the LSF administrator for other commands
  Restart
  • bctrld restart sbd [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl restart lsfd-sbd
mbatchd Shut down
  • bctrld stop sbd
  • badmin mbdrestart
Must be root or the LSF administrator for these commands
  Restart badmin mbdrestart  
  Reconfigure badmin reconfig

RES Start
  • bctrld start res [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl start lsfd-res

Must be root or a user who is listed in the lsf.sudoers file for the startup command

  Shut down
  • bctrld stop res [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl stop lsfd-res

Must be the LSF administrator for other commands

  Restart
  • bctrld restart res [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl restart lsfd-res
LIM Start
  • bctrld start lim [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl start lsfd-lim
Must be root or a user who is listed in the lsf.sudoers file for the startup command
  Shut down
  • bctrld stop lim [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl stop lsfd-lim

Must be the LSF administrator for other commands

  Restart
  • bctrld restart lim [host_name ...|all]
  • (Available starting in Fix Pack 14) systemctl restart lsfd-lim
  Restart all hosts in the cluster lsadmin reconfig

sbatchd daemon

Restarting the sbatchd daemon on a host does not affect jobs that are running on that host.

If the sbatchd daemon is shut down, the host is not available to run new jobs. Any existing jobs that are running on that host continue, but the results are not sent to the user until the sbatchd daemon is restarted.

LIM and RES daemons

Jobs running on the host are not affected by restarting the daemons.

If a daemon is not responding to network connections, the lsadmin command displays an error message with the host name. In this case, you must stop and restart the daemon manually.

If the load information manager (LIM) and the other daemons on the current management host are shut down, another host automatically takes over as the management host.

If resource execution server (RES) is shut down while remote interactive tasks are running on the host, the running tasks continue but no new tasks are accepted.

LSF daemons or binary files protected from operating system out-of-memory (OS OOM) killer

The following LSF daemons are protected from being stopped on systems that support out-of-memory (OOM) killer:

  • root RES
  • root LIM
  • root sbatchd
  • pim
  • melim
  • mbatchd
  • rla
  • mbschd
  • krbrenewd
  • elim
  • lim -2 (root)
  • mbatchd -2 (root)

For the preceding daemons, the oom_adj parameter is automatically set to -17 or the oom_score_adj parameter is set to -1000 when the daemons are started or restarted. This feature ensures that LSF daemons survive the OOM killer but not user jobs.

When the oom_adj or oom_score_adj parameters are set, the log messages are set to DEBUG level: Set oom_adj to -17. and Set oom_score_adj to -1000.

The root RES, root LIM, root sbatchd, pim, melim, and mbatchd daemons protect themselves actively and log messages.

All logs must set the LSF_LOG_MASK as LOG_DEBUG parameters.

In addition, the following parameters must be set:
  • RES must be configured as LSF_DEBUG_RES="LC_TRACE"
  • LIM must be configured as LSF_DEBUG_LIM="LC_TRACE"

    When the enterprise grid orchestrator (EGO) is enabled, the EGO_LOG_MASK=LOG_DEBUG parameter must be set in the ego.conf file

  • The sbatchd daemon must be configured as LSB_DEBUG_SBD="LC_TRACE"
  • The pim daemon must be configured as LSF_DEBUG_PIM="LC_TRACE"
  • The mbatchd daemon must be configured as LSB_DEBUG_MBD="LC_TRACE"