Offloading the mbatchd
daemon using the LSF
rate
limiter
(lsfproxyd
daemon)
By default, all LSF batch
commands contact the mbatchd
daemon (or the mbatchd
query child,
if configured). When there are excessive requests, such as scripts with tight loop running
bjobs commands, mbatchd
can become overloaded, negatively
affecting cluster performance. Starting in Fix Pack 14, to protect mbatchd
from
heavy loads, enable the LSF
rate
limiter
(controlled by the lsfproxyd
daemon), which acts as a gatekeeper between the
commands and the mbatchd
daemon.
The
rate
limiter is
supported on Linux.
The rate
limiter and the
lsfproxyd
daemon
The rate
limiter is managed by
the lsfproxyd
daemon, which monitors and controls the number of requests and
connections that can reach the mbatchd
daemon, protecting it from excess requests.
For a request to contact mbatchd
, it must first obtain a request token from
lsfproxyd
. After completing the request, the token returns to
lsfproxyd
. The lsfproxyd
daemon distributes tokens in a
round-robin fashion, ensuring that each user connection has an fair chance to be served and
processed, even under heavy loads.
You can configure to have multiple lsfproxyd
daemons
run within a single cluster; use the LSF_PROXY_HOSTS parameter to list the hosts on which you want
lsfproxyd
daemons to run. When multiple lsfproxyd
daemons are
defined for a cluster, they work together to balance workload and provide high availability: the
client command first randomly picks one to use, and if an lsfproxyd
daemon is
unavailable, then the command locates another one to use.
LIM
controls starting and restarting the
lsfproxyd
daemon on the LSF hosts
specified in the LSF_PROXY_HOSTS parameter in the lsf.conf
file. When the lsfproxyd
daemon starts, it binds to the listening port specified by
the LSF_PROXY_PORT parameter in the lsf.conf file.
LIM
restarts the lsfproxyd
daemon if it dies.
To control the number of connections to the mbatchd
daemon, the
lsfproxyd
policy is governed by three attributes set in the PROXYD_POLICIES parameter of the lsb.params
configuration file: max
, nominal
, and throttle
.
With the PROXYD_POLICIES configuration, the lsfproxyd
policy
ensures that users don't monopolize the rate
limiter system.
For details on setting up your system for the rate limiter, and using it, see Enabling and configuring the LSF rate limiter.
Daemon log files for diagnosing jobs
To
troubleshoot the rate
limiter and its
interactions with the lsfproxyd
daemon, see Diagnostics for the LSF rate limiter and lsfproxyd daemon.
Temporarily block users and hosts for performance
Furthermore, to allow an administrator to temporarily block
non-administrator and non-root users, hosts, or both, from performing mbatchd
daemon operations when using the rate
limiter, the
badmin command has been extended to support badmin lsfproxyd block.
Administrators can run this command to temporarily stop abusive or misbehaving users from
interacting with the LSF
cluster, and to avoid performance impact on other users.