Offloading the mbatchd daemon using the LSF rate limiter (lsfproxyd daemon)

By default, all LSF batch commands contact the mbatchd daemon (or the mbatchd query child, if configured). When there are excessive requests, such as scripts with tight loop running bjobs commands, mbatchd can become overloaded, negatively affecting cluster performance. Starting in Fix Pack 14, to protect mbatchd from heavy loads, enable the LSF rate limiter (controlled by the lsfproxyd daemon), which acts as a gatekeeper between the commands and the mbatchd daemon. The rate limiter is supported on Linux.

The rate limiter and the lsfproxyd daemon

The rate limiter is managed by the lsfproxyd daemon, which monitors and controls the number of requests and connections that can reach the mbatchd daemon, protecting it from excess requests. For a request to contact mbatchd, it must first obtain a request token from lsfproxyd. After completing the request, the token returns to lsfproxyd. The lsfproxyd daemon distributes tokens in a round-robin fashion, ensuring that each user connection has an fair chance to be served and processed, even under heavy loads.

You can configure to have multiple lsfproxyd daemons run within a single cluster; use the LSF_PROXY_HOSTS parameter to list the hosts on which you want lsfproxyd daemons to run. When multiple lsfproxyd daemons are defined for a cluster, they work together to balance workload and provide high availability: the client command first randomly picks one to use, and if an lsfproxyd daemon is unavailable, then the command locates another one to use.

LIM controls starting and restarting the lsfproxyd daemon on the LSF hosts specified in the LSF_PROXY_HOSTS parameter in the lsf.conf file. When the lsfproxyd daemon starts, it binds to the listening port specified by the LSF_PROXY_PORT parameter in the lsf.conf file. LIM restarts the lsfproxyd daemon if it dies.

To control the number of connections to the mbatchd daemon, the lsfproxyd policy is governed by three attributes set in the PROXYD_POLICIES parameter of the lsb.params configuration file: max, nominal, and throttle. With the PROXYD_POLICIES configuration, the lsfproxyd policy ensures that users don't monopolize the rate limiter system.

For details on setting up your system for the rate limiter, and using it, see Enabling and configuring the LSF rate limiter.

Daemon log files for diagnosing jobs

To troubleshoot the rate limiter and its interactions with the lsfproxyd daemon, see Diagnostics for the LSF rate limiter and lsfproxyd daemon.

Temporarily block users and hosts for performance

Furthermore, to allow an administrator to temporarily block non-administrator and non-root users, hosts, or both, from performing mbatchd daemon operations when using the rate limiter, the badmin command has been extended to support badmin lsfproxyd block. Administrators can run this command to temporarily stop abusive or misbehaving users from interacting with the LSF cluster, and to avoid performance impact on other users.