badmin
The badmin command is the administrative tool for LSF.
Synopsis
badmin subcommand optionsDescription
The badmin command provides a set of subcommands to control and monitor LSF. If you do not include subcommands, the badmin command prompts for subcommands from the standard input.
Information about each subcommand is available through the -h option.
The badmin subcommands include privileged and non-privileged subcommands. Only root or LSF administrators can run privileged subcommands. The following subcommands are privileged:
- diagnose
- gpddebug
- gpdrestart
- gpdtime
- hclose
- hghostadd
- hghostdel
- hopen
- hpower
- lsfproxyd
- mbddebug
- mbdrestart
- perflog
- perfmon
- qact
- qclose
- qinact
- qopen
- rc
- reconfig
- security
The configuration file lsf.sudoers must be set to use the privileged command hstartup by a non-root user.
All other commands are non-privileged commands and can be used by any LSF user. If the LSF_AUTH parameter is not defined in the lsf.conf file, privileged ports are used and the badmin command must be installed because it needs to send the request through a privileged port. The badmin executable file is installed with the setuid flag turned on.
When you use subcommands for which multiple host names can be specified, do not enclose the host names in quotation marks.
Subcommand synopsis
ckconfig [-v]Options
- subcommand
- Runs the specified subcommand. See the Usage section.
- -h
- Prints command usage to stderr and exits.
- -V
- Prints LSF release version to stderr and exits.
Usage
- ckconfig [-v]
- Checks LSF
configuration files that are located in the LSB_CONFDIR/cluster_name/configdir
directory, and checks the LSF_ENVDIR/lsf.licensescheduler file.
The LSB_CONFDIR variable is defined in the lsf.conf file, in LSF_ENVDIR or /etc (if LSF_ENVDIR is not defined).
By default, the badmin ckconfig command displays only the result of the configuration file check. If warning errors are found, the badmin command prompts you to display detailed messages.- -v
- Verbose mode. Displays detailed messages about configuration file checking to stderr.
- diagnose pend jobid ...
- Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in
the lsb.params file.
badmin diagnose 1057
- diagnose -c jobreq [-f snapshot_file_name] [-t xml | -t json]
- UNIX only. Saves the current contents of the scheduler job bucket information into an XML or
JSON snapshot file as raw data.
Jobs are put into scheduling buckets based on resource requirements and different scheduling policies. Saving the contents into a snapshot file is useful for data analysis by parsing the file or by running a simple text search on its contents.
This feature is helpful if you want to examine a sudden large performance impact on the scheduler. Use the snapshot file to identify any users with many buckets or large attribute values.
You can use the following options:
- -c jobreq
- Required.
- -f file_name
- Specifies a snapshot file in which to save the information. It is either a file name, which is
located in the DIAGNOSE_LOGDIR directory, or a full path file name. If the
specified snapshot file exists, it is overwritten with the current information.
The default name for the snapshot file is jobreq_<hostname>_<dateandtime>.<format>, where <format> is xml or json, depending on the specified format of the snapshot file.
The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as the mbatchd daemon log permissions. Everyone has read and execute access but the LSF_ADMIN owner has write, read, and execute access.
- -t xml | -t json
- Specifies the format of the snapshot file. Specify -t xml for the
snapshot file to be in XML format, or specify -t json for the snapshot file
to be in JSON format.
The default format for the snapshot file is XML, and the extension of the snapshot file is .xml. If the snapshot file is in JSON format, the extension of the snapshot file is .json.
- diagnose -c lsfproxyd [-f logfile_name] [-d minutes] | [-o]]
- This feature is helpful if an unexpected lsfproxyd load (for the LSF
rate
limiter
starting in Fix Pack 14) causes the cluster to slow or fail to respond to requests. For example,
many bjobs command queries might cause a high network load and prevent the
lsfproxyd daemon from responding. Running this command with its options enables
the lsfproxyd daemon to dump the query source information into a log file.
The log file shows information about the source of queries for easier troubleshooting. The log file shows who made these requests, where the requests came from, and the data size of the query.
You can also configure this feature by enabling the ENABLE_DIAGNOSE parameter in the lsb.params file to log the entire query information as soon as the cluster starts. However, the dynamic settings from the command override the static parameter settings. Also, after the duration you specify to track the query information expires, the static diagnosis settings take effect.
You can use the following options to dynamically set the time, specify a log file, and allow the lsfproxyd daemon to collect information:
- -c lsfproxyd
- Required.
- -f
- Specifies a log file in which to save the information. It is either a file name or a full path
file name.
The default name for the log file is query_info.queryproxylog.hostname.
The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as lsfproxyd daemon log permissions. Everyone has read and execute access, but the LSF_ADMIN user has write, read, and execute access.
If you specify the log file in the lsb.params file and then later specify a different log file in the command line, the one in the command line takes precedence. Logging continues until the specified duration is over, or until you stop dynamic logging. It then switches back to the static log file location.
- -d minutes
- The duration in minutes you specify to track the query information. The lsfproxyd daemon reverts to static settings after the duration is over, or until you stop it manually, restart (with the lsfproxyd command), or reconfigure (with badmin reconfig command). The default value for this duration is infinite. By default, query information is always logged.
- -o
- Turns off dynamic diagnosis (stop logging). If the ENABLE_DIAGNOSE=lsfproxyd parameter is configured, it returns to the static configuration.
- diagnose -c query [-f logfile_name] [-d minutes] | [-o]]
- This feature is helpful if an unexpected mbatchd query load causes the
cluster to slow or fail to respond to requests. For example, many bjobs command
queries might cause a high network load and prevent the mbatchd daemon from
responding. Running this command with its options enables the mbatchd daemon to
dump the query source information into a log file.
The log file shows information about the source of queries for easier troubleshooting. The log file shows who made these requests, where the requests came from, and the data size of the query.
You can also configure this feature by enabling the DIAGNOSE_LOGDIR and ENABLE_DIAGNOSE parameters in the lsb.params file to log the entire query information as soon as the cluster starts. However, the dynamic settings from the command override the static parameter settings. Also, after the duration you specify to track the query information expires, the static diagnosis settings take effect.
You can use the following options to dynamically set the time, specify a log file, and allow the mbatchd daemon to collect information:
- -c query
- Required.
- -f
- Specifies a log file in which to save the information. It is either a file name, which is
located in the DIAGNOSE_LOGDIR directory, or a full path file name.
The default name for the log file is query_info.querylog.<host_name>.
The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as mbatchd daemon log permissions. Everyone has read and execute access, but the LSF_ADMIN user has write, read, and execute access.
If you specify the log file in the lsb.params file and then later specify a different log file in the command line, the one in the command line takes precedence. Logging continues until the specified duration is over, or until you stop dynamic logging. It then switches back to the static log file location.
- -d minutes
- The duration in minutes you specify to track the query information. The mbatchd daemon reverts to static settings after the duration is over, or until you stop it manually, restart (with the badmin mbdrestart command), or reconfigure (with badmin reconfig command). The default value for this duration is infinite. By default, query information is always logged.
- -o
- Turns off dynamic diagnosis (stop logging). If the ENABLE_DIAGNOSE=query parameter is configured, it returns to the static configuration.
- gpdckconfig [-v]
- Checks the global policy configuration file lsb.globalpolicies located in
the LSB_CONFDIR/cluster_name/configdir directory.
The LSB_CONFDIR variable is defined in the lsf.conf file, in LSF_ENVDIR or /etc (if LSF_ENVDIR is not defined).
By default, the badmin gpdckconfig command displays only the result of the configuration file check. If warning errors are found, the badmin command prompts you to display detailed messages.
You can run the badmin gpdckconfig command only on the management host or management candidate hosts in the Global Policy Daemon Cluster (GPD Cluster).
- -v
-
Verbose mode. Displays detailed messages about configuration file checking to stderr.
- gpddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]
- Sets the message log level for the gpolicyd daemon to include additional
information in log files. You must be root or the LSF
administrator to use this command.If the command is used without any options, the following default values are used:
- class_name
- Not defined (no additional classes are logged).
- debug_level=0
- As specified but the LOG_DEBUG level in the LSF_LOG_MASK parameter.
- logfile_name
- Not defined (LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name).
- -c class_name ...
- Specifies software classes for which debug messages are to be logged.
By default, class_name is not defined and no additional classes are logged.
The format of class_name is the name of a class, or a list of class names separated by spaces and enclosed in quotation marks. Classes are also listed in the lsf.h header file.
The following log classes are supported:- LC_AUTH
- Log authentication messages.
- LC_COMM
- Log communication messages.
- LC_SYS
- Log system call messages.
- LC_TRACE
- Log significant program walk steps.
- LC_XDR
- Log everything that is transferred by XDR.
- LC_XDRVERSION
- Log messages for XDR version.
- LC2_G_FAIR
- Log global fair share messages.
- -l debug_level
- Specifies level of detail in debug messages. The higher the number, the more detail that is
logged. Higher levels include all lower levels.debug_level has the following values:
- Default: 0
- LOG_DEBUG level in parameter LSF_LOG_MASK.
- 0
- LOG_DEBUG level for parameter LSF_LOG_MASK in the lsf.conf file.
- 1
- LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG1 level includes the LOG_DEBUG level.
- 2
- LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG2 level includes LOG_DEBUG1 and LOG_DEBUG levels.
- 3
- LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG3 level includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.
- -f logfile_name
- Specifies the name of the file into which debugging messages are to be logged. A file name with
or without a full path can be specified.
If a file name without a path is specified, the file is saved in the LSF system log directory.
The name of the file that is created has the following format:
logfile_name.gpolicyd.log.host_name
On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.
On Windows, if the specified path is not valid, no log file is created.
By default, logfile_name is the current LSF system log file in the LSF system log file directory.
- -o
- Turns off temporary debug settings and resets them to the daemon start state. The message log
level is reset back to the value of LSF_LOG_MASK and classes are reset to the
value of LSB_DEBUG_GPD.
The log file is also reset back to the default log file.
- gpdrestart [-v] [-f]
- Dynamically reconfigures LSF global
policies and restarts the gpolicyd daemon.
The global policy configuration file lsb.globalpolicies is checked for errors and the results are printed to stderr. If no errors are found, the lsb.globalpolicies file is reloaded and the gpolicyd daemon is restarted.
If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, the gpolicyd daemon is not restarted, and the badmin command exits.
You can run the badmin gpdrestart command only on the management host or management candidate hosts in the Global Policy Daemon Cluster (GPD Cluster).
- -v
- Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to stderr.
- -f
- Disables interaction and proceeds with the gpolicyd daemon restart if configuration files contain no unrecoverable errors.
- gpdtime [-l timing_level] [-f logfile_name] [-o]
- Sets the timing level for the gpolicyd daemon to include extra timing
information in log files. You must be root or the LSF
administrator to use this command.If the command is used without any options, the following default values are used:
- timing_level
- Not defined (timing information is recorded).
- logfile_name
- Not defined (current LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name).
- -l timing_level
- Specifies the detail of timing information that is included in log files. Timing messages
indicate the execution time of functions in the software and are logged in milliseconds.
The following values are supported: 1|2|3|4|5
The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.
By default no timing information is logged.
- -f logfile_name
- Specify the name of the file into which timing messages are to be logged. A file name with or
without a full path can be specified.
If a file name without a path is specified, the file is saved in the LSF system log file directory.
The name of the file that is created has the following format:logfile_name.gpolicyd.log.host_name
On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.
On Windows, if the specified path is not valid, no log file is created.
Note: Both timing and debug messages are logged in the same files.The default is the current LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name.
- -o
- Optional. Turns off temporary timing settings and resets them to the daemon start state. The
timing level is reset back to the value of the parameter for the corresponding daemon
(LSB_TIME_GPD).
The log file is also reset back to the default log file.
- hclose [-C comment] [-i "lock_id"] [host_name ... | host_group ... | compute_unit ... | all]
- Closes batch server hosts. Specify the names of any server hosts, host groups, or compute units.
All batch server hosts are closed if the reserved word all is specified. If no
argument is specified, the local host is assumed. A closed host does not accept any new jobs, but
jobs that are already dispatched to the host are not affected. This behavior is different from a
host closed by a window; all jobs on a host are suspended when a time window closes on the host.
If the host is already closed, this command option has no effect unless you specify the -i option to attach a lock ID to the host.
- -C comment
- Logs the text as an administrator comment record to the lsb.events file.
The maximum length of the comment string is 512 characters.
If you close a host group or compute unit, each member is displayed with the same comment string.
You cannot use the badmin hopen command to open a host that was borrowed through LSF resource connector that is in closed_RC status.
- -i "lock_id"
- Closes the host and attaches the specified lock ID to the closed host. Each lock ID is a string
that can contain up to 128 alphanumeric and underscore (_) characters. The
keyword all is reserved and cannot be used as the lock ID. A closed host can have
multiple lock IDs, and the host remains closed until there are no more lock IDs attached to the
host.
Use -i together with the -C option to attach an administrator message to the lock ID.
If you try to attach a lock ID that is already attached to the host (even with a different comment), the command fails for that host.
Use the badmin hopen -i command option to remove one or more lock IDs from a host.
This allows multiple users to keep a host closed for different reasons. For example, userA might be updating an application while userB is configuring the operating system. The host remains closed until both users complete their updates and open the host using their specific lock IDs.
- help [command ...] | ? [command ...]
- Displays the syntax and functions of the specified commands.
- hghostadd [-C comment] host_group | compute_unit |host_name [host_name ...]
- If dynamic host configuration is enabled, dynamically adds hosts to a host group or compute
unit. After the mbatchd daemon receives the host information from the LIM on the
management host, it
dynamically adds the host without triggering reconfiguration.
After the host is added to the host group or compute unit, it is considered part of that group for scheduling decisions for newly submitted jobs and for existing pending jobs.
This command fails if any of the specified host groups, compute units, or host names are not valid.
Restriction: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.params file, you cannot use the hghostadd subcommand because all host allocation is under control of enterprise grid orchestrator (EGO).- -C comment
- Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- hghostdel [-f] [-C comment] host_group | compute_unit |host_name [host_name ...]
- Dynamically deletes hosts from a host group or compute unit by triggering reconfiguration of the
mbatchd daemon.
This command fails if any of the specified host groups, compute units, or host names are not valid.
CAUTION:To change a dynamic host to a static host, first use the command badmin hghostdel to remove the dynamic host from any host group or compute unit that it belongs to. Then, configure the host as a static host in the lsf.cluster.cluster_name file.
Restriction: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.params file, you cannot use the hghostdel subcommand because all host allocation is under control of enterprise grid orchestrator (EGO).- -f
- Disables interaction and does not ask for confirmation when reconfiguring mbatchd.
- -C comment
- Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- hhist [-t time0,time1] [-f logfile_name] [host_name ...]
- Displays historical events for specified hosts, or for all hosts if no host is
specified. Host events are host opening and closing. Also, both badmin command
and policy- or job triggered power-related events (suspend, resume, reset) are displayed.
- -t time0,time1
-
Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all host events in the event log file.
- -f logfile_name
- Specify the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.
If you specified an administrator comment with the -C option of the host control commands hclose or hopen, hhist displays the comment text.
- hist [-t time0,time1] [-f logfile_name]
- Displays historical events for all the queues, hosts, and
mbatchd. Both badmin command and policy- or job-triggered
power-related events (suspend, resume, reset) are displayed.
- -t time0,time1
- Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all queue events in the event log file.
- -f logfile_name
- Specify the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events file. Option -f is useful for offline analysis.
If you specified an administrator comment with the -C option of the queue, host, and mbatchd daemon commands, the hist option displays the comment text.
- hopen [-C comment] [-i "lock_id ... | all"] [host_name ... | host_group] [host_name ... | host_group ... | compute_unit ... | all]
- Opens batch server hosts. Specify the names of any server hosts, host groups, or compute units.
All batch server hosts are opened if the reserved word all is specified. If no
host, host group, or compute unit is specified, the local host is assumed. A host accepts batch jobs
if it is open.Important: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.paramsfile, and a host is closed by EGO, it cannot be reopened by the badmin hopen command. Hosts closed by EGO have status closed_EGO in the bhosts -l command output.
- -C comment
- Logs the text as an administrator comment record to the lsb.events file.
The maximum length of the comment string is 512 characters.
If you open a host group or compute unit, each member is displayed with the same comment string.
- -i "lock_id ... | all"
- Removes the specified lock IDs from the closed host. Also opens the host if there are no more
lock IDs remaining on the host.
Use a space to separate multiple lock IDs. Use the all keyword to remove all lock IDs and to open the host.
This allows multiple users to keep a host closed for different reasons. For example, userA might be updating an application while userB is configuring the operating system. The host remains closed until both users complete their tasks and open the host using their specific lock IDs.
- hpower [suspend | resume] [-C comment] [hostname...]
- Manually switches hosts between a power-saving state or a working state.
- suspend | resume
- The state that you want to switch the host to.
- -C comment
- Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- hrestart
- This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld restart sbd command instead to restart the sbatchd daemon.
- hshutdown
- This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld stop sbd command instead to shut down the sbatchd daemon.
- hstartup
- This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld start sbd command instead to start the sbatchd daemon.
- lsfproxyd [[enable | disable] all | query | sub | other]] | status
- Starting in Fix Pack 14, enables or disables the LSF
rate
limiter
while the
lsfproxyd
daemon runs. Fix Pack 14 introduces the rate limiter. The rate limiter is managed by thelsfproxyd
daemon, which monitors and controls the number of requests and connections that can reach thembatchd
daemon, protecting it from excess requests. For a request to contactmbatchd
, it must first obtain a request token fromlsfproxyd
. After completing the request, the token returns tolsfproxyd
. Thelsfproxyd
daemon distributes tokens in a round-robin fashion, ensuring that each user connection has an fair chance to be served and processed, even under heavy loads.If a request type is disabled, thelsfproxyd
daemon will not distribute tokens for that request type, and requests will be processed to thembatchd
daemon instead. If the rate limiter is disabled (or if alllsfproxyd
daemons are down), thembatchd
daemon will accept requests with tokens; for example:$ badmin lsfproxyd disable query lsfproxyd service status: QUERY:DISABLED SUBMISSION:ENABLED OTHER:ENABLED
Runningbadmin lsfproxyd status
directly communicate with thelsfproxyd
daemon, and displays status information for it, including the following data:- Whether or not the different request types are enabled
- Its share of the token limit, and how many tokens are in use
- Simple metrics gathered over a sampling period:
- Number of requests
- Number of rejections broken by category
- Number of blocked requests
- Number of errors
- The count for the current sampling period
- The count for the last completed sampling period
- The maximum count seen over all completed sampling periods
- The average count calculated over all completed sampling periods
Here is an example output for runningbadmin lsfproxyd status
if request types are enabled:$ badmin lsfproxyd status lsfproxyd service status: QUERY:ENABLED SUBMISSION:ENABLED OTHER:ENABLED lsfproxyd host status: HOSTNAME: host1 STATUS: CONNECTED PID: 1592462 TOKEN_LIMIT TOKENS_IN_USE_TOTAL TOKENS_IN_USE_PRIVILEGED QUERY 1 242 242 SUBMISSION 1 263 263 OTHER 1 0 0 lsfproxyd started: Thu Feb 16 14:03:15 End time of last sample period: Fri Feb 17 06:41:15 Sample period: 60 Seconds ------------------------------------------------------------------------------ Metrics Current Last Max Avg Total ------------------------------------------------------------------------------ Requests Query 0 300 300 300 7 Submission 0 900 900 900 1 Other 0 0 0 0 0 Rejected Query 0 0 0 0 0 Submission 0 0 0 0 0 Other 0 0 0 0 0 Blocked 0 0 0 0 0 Error 0 0 0 0 0
Here is an example output for runningbadmin lsfproxyd status
if request types are disabled:lsfproxyd service status: QUERY: DISABLED SUBMISSION: DISABLED OTHER: DISABLED lsfproxyd host status: HOSTNAME: host1 STATUS: DISCONNECTED PID: - TOKEN_LIMIT TOKENS_IN_USE_TOTAL TOKENS_IN_USE_PRIVILEGED QUERY - - - SUBMISSION - - - OTHER - - - No metric data available.
- lsfproxyd [[block | unblock] [all | [-u "user1 user2 ..."] [-m "host1 host ..."]]] | blocklist
- Starting in Fix Pack 14, allows an administrator to temporarily block non-administrator and
non-root users, hosts, or both, from performing
mbatch
daemon operations when using the rate limiter. Manually unblock these users and hosts with the unblock option, or automatically, when thelsfproxyd
daemon restarts for the rate limiter.Note that if the
lsfproxyd
daemon is down at the time that the administrator sends an block update and the daemon is restarted, or if it is restarted after an administrator blocks a user, then it is possible for a blocked user to contact thembatchd
daemon. In this situation, the administrator should run the block command again to refresh block lists on the affected daemons.The intended usage for badmin lsfproxyd block is to allow administrators to temporarily stop users from interacting with the LSF cluster. It is used in extenuating circumstances where the administrator deems abusive users or hosts are impacting LSF performance and are degrading the quality of service for others.
To block users or hosts, run badmin lsfproxyd block. To unblock them, run badmin lsfproxyd unblock. To show all currently blocked users and hosts, run badmin lsfproxyd blocklist. See the following details for example command usage.
Example usage and output messages of blocking all users and hosts:$ badmin lsfproxyd block all <all> added to the the blocklist on lsfproxyd host <lsfproxydhost1>
Example usage and output message of unblocking all users and hosts:$ badmin lsfproxyd unblock all <all> removed from the the blocklist on lsfproxyd host <lsfproxydhost1>
Example usage and output messages of blockinguser1
anduser2
:$ badmin lsfproxyd block -u "user1 user2" Users <user1 user2> added to the blocklist on lsfproxyd host <lsfproxydhost1>
Example usage and output messages of blockinghostA
andhostB
:$ badmin lsfproxyd block -m "hostA hostB" Hosts <hostA hostB> added to the blocklist on lsfproxyd host <lsfproxydhost1>
Example usage and output messages of blockinguser1
athostA
:$ badmin lsfproxyd block -u "user1" -m "hostA" <user1@hostA> added to the blocklist on lsfproxyd host <lsfproxydhost1> <user1@hostA> added to the blocklist on lsfproxyd host <lsfproxydhost2>
Example usage and output messages of blockinguser1
anduser2
athostA
andhostB
:$ badmin lsfproxyd block -u "user1 user2" -m "hostA hostB" <user1@hostA user1@hostB user2@hostA user2@hostB> added to the blocklist on lsfproxyd host <lsfproxydhost1> <user1@hostA user1@hostB user2@hostA user2@hostB> added to the blocklist on lsfproxyd host <lsfproxydhost2>
Example usage and output message of unblockinguser1
:$ badmin lsfproxyd unblock -u user1 Users <user1> removed from the blocklist on lsfproxyd host <lsfproxydhost1> Users <user1> removed from the blocklist on lsfproxyd host <lsfproxydhost2>
Example usage and output messages of unblockinguser1
athostA
andhostB
:$ badmin lsfproxyd unblock -u "user1" -m "hostA hostB" <user1@hostA user1@hostB> removed from the blocklist on lsfproxyd host <lsfproxydhost1> <user1@hostA user1@hostB> removed from the blocklist on lsfproxyd host <lsfproxydhost2>
Example usage to see a summary of which users and hosts are currently blocked:$ badmin lsfproxyd blocklist lsfproxyd host - host1 All blocked: No Blocked users: user1 user2 Blocked hosts: - Blocked users@hosts: user4@exechost1 user3@exechost2 lsfproxyd host - host2 Unable to contact <host2> lsfproxyd host - host3 All blocked all: No Blocked users: user1 user2 Blocked hosts: - Blocked users@hosts: user4@exechost1 user3@exechost2
- mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
- Sets message log level for the mbatchd daemon to include additional
information in log files. You must be root or the LSF
administrator to use this command.
- -s log_queue_size
- Specifies the maximum number of entries in the logging queue that the mbatchd
logging thread uses. Specify an integer 100 - 500000. This value temporarily overrides the value of
the LSF_LOG_QUEUE_SIZE parameter in the lsf.conf file. The
logging queue contains the messages to be written to the log files.
If the LSF_LOG_THREAD=N parameter is defined in the lsf.conf file, the -s option is ignored.
See the sbddebug subcommand for an explanation of the other options.
For the -c option, the mdbdebug subcommand has the following valid log classes in addition to the valid log classes for the sbddebug subcommand:- LC2_EST
- Log messages for the simulation-based estimator. You cannot use the mbddebug subcommand to change this log class.
- LC2_G_FAIR
- Log messages for global fair share.
- mbdhist [-t time0,time1] [-f logfile_name]
- Displays historical events for the mbatchd daemon. Events describe the
starting and exiting of the mbatchd daemon.
- -t time0,time1
-
Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all queue events in the event log file.
- -f logfile_name
- Specify the file name of the event log file. Specify either an absolute or a relative path name. The default is to use the current event log file that is in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.
If you specified an administrator comment with the -C option of the mbdrestart subcommand, the mbdhist subcommand displays the comment text.
- mbdrestart [-C comment] [-v] [-f] [-p | -s]
- Dynamically reconfigures LSF and
restarts the mbatchd and mbschd daemons. When live
configuration with the bconf command is enabled (the
LSF_LIVE_CONFDIR parameter is defined in the
lsf.conffile), the badmin mbdrestart command uses the
configuration files that are generated by the bconf command.
Configuration files are checked for errors and the results are printed to stderr. If no errors are found, configuration files are reloaded, the mbatchd and mbschd daemons are restarted, and events in the lsb.events file are replayed to recover the running state of the last mbatchd daemon. While the mbatchd daemon restarts, it is unavailable to service requests.
If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, the mbatchd and mbschd daemons do not restart, and the badmin command exits.
Important: If the lsb.events file is large, or many jobs are running, restarting the mbatchd daemon can take several minutes. If you need to reload only the configuration files, use the badmin reconfig command.- -C comment
- Logs the text of comment as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- -v
- Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to stderr.
- -f
- Disables interaction and forces reconfiguration and mbatchd daemon restart to proceed if configuration files contain no unrecoverable errors.
- -p
- Allows parallel mbatchd daemon restarts. Restart forks a child
mbatchd daemon process to help minimize downtime for LSF.
LSF starts a new or child mbatchd daemon process to read the configuration files
and replay the event file. The old mbatchd daemon can respond to client commands,
handle job scheduling and status updates, dispatching, and updating new events to event files. When
restart is complete, the child takes over as mbatchd daemon, and the old
mbatchd daemon dies.
This option is the default behavior for mbatchd daemon restarts. Use the -s option to use serial mbatchd daemon restarts.
- -s
- Allows serial mbatchd daemon restarts. Use this option to change the default mbatchd deamon behavior, which is to restart in parallel.
- mbdtime [-l timing_level] [-f logfile_name] [-o]
- Sets timing level for the mbatchd daemon to include extra timing information in log files. You must be root or the LSF administrator to use this command.
- perflog [-t sample_period] [-f logfile_name] [-d duration] | [-o]]
- This feature is useful for troubleshooting large clusters where a cluster might not be
responding due to mbatchd daemon performance problems. In such cases, the
mbatchd daemon performance might be slow in handling high volume request, such as
job submission, job status requests, and job rusage requests.
- -t
- Specifies the sampling period in minutes for performance metric collection. The default value is 5 minutes.
- -f
- Specifies a log file in which to save the information. It is either a file name or a full path
file name. If you do not specify the path for the log file, then its default path is used. The
default name for the log file is
mbatchd.perflog.<host_name>.
The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as mbatchd daemon log permissions. Everyone has read and execute access, but the LSF_ADMIN user has write, read and execute access.
- -d
- The duration in minutes to keep logging performance metric data. The mbatchd daemon does not log messages after the duration expires, or until you stop it manually, restart the mbatchd daemon, or reconfigure with the reconfig mbatchd command. The default value for the duration is infinite. By default, performance metric data is always logged).
- -o
- Turns off dynamic performance metric logging (stop logging). If the LSB_ENABLE_PERF_METRICS_LOG parameter is enabled, logging returns to the static configuration.
- perfmon start [sample_period] | setperiod sample_period | stop | view [-json]
- Dynamically enables and controls scheduler performance metric collection.
Collecting and recording performance metric data might affect the performance of LSF. Smaller sampling periods can cause the lsb.streams file to grow faster.
The following metrics are collected and recorded in each sample period:- The number of queries that are handled by mbatchd
- The number of queries for each of jobs, queues, and hosts. (bjobs, bqueues, and bhosts commands, and other daemon requests)
- The number of jobs submitted (divided into job submission requests and jobs submitted)
- The number of jobs dispatched
- The number of jobs reordered; that is, the number of jobs that reused the resource allocation of a finished job (the RELAX_JOB_DISPATCH_ORDER parameter in the lsb.params or lsb.queues file)
- The number of jobs completed
- The numbers of jobs that are sent to remote cluster
- The numbers of jobs that are accepted by from cluster
- The file descriptors that are used by the mbatchd daemon
- The following scheduler performance metrics are collected:
- A shorter scheduling interval means that the job is processed more quickly
- Number of different resource requirement patterns for jobs in use, which might lead to different candidate host groups. The more matching hosts that are required, the longer it takes to find them, which means a longer scheduling session.
- Number of buckets (groups) in which jobs are put based on resource requirements and different scheduling policies. More buckets means a longer scheduling session.
- start [sample_period]
- Start performance metric collection dynamically and specify an optional sampling period in
seconds for performance metric collection.
If no sampling period is specified, the default period set in the SCHED_METRIC_SAMPLE_PERIOD parameter in the lsb.params file is used.
- stop
- Stop performance metric collection dynamically.
- view
- Display performance metric information for the current sampling period.
- setperiod sample_period
- Set a new sampling period in seconds.
- qact [-C comment] [queue_name ... | all]
- Activates a deactivated queue so that submitted jobs are dispatched from the queue. If the
reserved word all is specified, the qact subcommand activates
all queues. If no queue name is specified, the system default queue is
activated. Jobs in a queue can be dispatched only if the queue is activated. A queue that is inactivated by its run windows cannot be reactivated by this command.
- -C comment
- Logs the text of the comment as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- qclose [-C comment] [queue_name ... | all]
- Closes a queue to prevent jobs from being submitted to the queue. If the reserved word
all is specified, the qclose subcommand closes all queues. If no
queue name is specified, the system default queue is closed. A queue
does not accept submitted LSF jobs
if it is closed.
- -C comment
- Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.
- qhist [-t time0,time1] [-f logfile_name] [queue_name ...]
- Displays historical events for specified queues, or for all queues if no queue is specified.
Queue events are queue opening, closing, activating, and inactivating.
- -t time0,time1
-
Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all queue events in the event log file.
- -f logfile_name
-
Specifies the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.
If you specified an administrator comment with the -C option of the queue control subcommands qclose, qopen, qact, and qinact, the qhist subcommand displays the comment text.
- qinact [-C comment] [queue_name ... | all]
- Deactivates a queue to stop submitted jobs from being dispatched from the queue. If the reserved
word all is specified, all queues are deactivated. If no queue name is specified,
the system default queue is deactivated. Jobs in a queue cannot be
dispatched if the queue is inactivated.
- -C comment
- Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
- qopen [-C comment] [queue_name ... | all]
- Opens a closed queue so users can submit jobs to it. If the reserved word all
is specified, the qopen subcommand opens all queues. If no queue name is
specified, the system default queue is opened. A queue accepts
submitted LSF jobs
only if it is open.
- -C comment
- Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.
- quit
- Exits the badmin command session.
- rc error [-t <days>d | <hours>h | <minutes>m] [-p "provider ..."]
- Shows LSF
resource connector error messages from the host providers. These errors are provided by the
third-party mosquitto message queue application, which must be running on the host.
- -t <days>d | <hours>h | <minutes>m
- Specifies the earliest time from which to retrieve the error messages.Note: When specifying days, badmin retrieves messages from this time at midnight. For example, when running badmin rc error -t 1d, badmin retrieves messages from today at midnight, and when running badmin rc error -t 2d, badmin retrieves messages from yesterday at midnight.
- -p "provider ...
- Specifies the host providers from which to retrieve the error messages. Use a space to separate multiple host providers.
- rc view [-c "instances | policies | templates ..."] [-p "provider ..."]
- Shows LSF
resource connector information from the host providers.
- -c "instances | policies | templates ..."
- Specifies whether to view information on instances, policies, or templates. Use a space to separate multiple types of information. By default, this command shows information on instances only. If policies is selected with the -c option, the -p option is ignored because all policies are displayed, not just for the specified providers.
- -p "provider ..."
- Specifies the host providers from which to view information. Use a space to separate multiple host providers. If policies is selected with the -c option, the -p option is ignored because all policies are displayed, not just for the specified providers.
- reconfig [-v] [-f]
- Dynamically reconfigures LSF.
Configuration files are checked for errors and the results are displayed to stderr. If no errors are found in the configuration files, a reconfiguration request is sent to the mbatchd daemon and configuration files are reloaded. When live configuration with the bconf command is enabled (the LSF_LIVE_CONFDIR parameter is defined in the lsf.conffile), the badmin reconfig command uses the configuration files that are generated by the bconf command.
Important: The reconfig subcommand does not restart the mbatchd daemon and does not replay the lsb.events file. To restart the mbatchd daemon and replay the lsb.events file, use the badmin mbdrestart command.When you use this command, the mbatchd daemon is available to service requests while reconfiguration files are reloaded. Configuration changes made since system boot or the last reconfiguration take effect.
If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, reconfiguration fails, and the badmin command exits.
If you add a host to a queue or to a host group or compute unit, the new host is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must use the command badmin mbdrestart.
Resource requirements that are determined by the queue no longer apply to a running job after you use the badmin reconfig command. For example, if you change the RES_REQ parameter in a queue and reconfigure the cluster, the previous queue-level resource requirements for running jobs are lost.
- -v
- Verbose mode. Displays detailed messages about the status of the configuration files. Without this option, the default is to display the results of configuration file checking. All messages from the configuration file check are printed to stderr.
- -f
- Disables interaction and proceeds with reconfiguration if configuration files contain no unrecoverable errors.
- sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]
- Sets the message log level for the sbatchd daemon to include additional
information in log files. You must be root or the LSF
administrator to use this command.
In LSF multicluster capability, debug levels can be set only for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.
If the command is used without any options, the following default values are used:- class_name=0
- No additional classes are logged.
- debug_level=0
- LOG_DEBUG level in parameter LSF_LOG_MASK.
- logfile_namedaemon_name.log.host_name
- LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.
- host_name=local_host
- Host from which the command was submitted.
- -c class_name ...
- Specify software classes for which debug messages are to be logged.Note: Classes are also listed in the lsf.h header file.
By default, no additional classes are logged (class name 0).
The following log classes are supported:- LC_ADVRSV and LC2_ADVRSV
- Log advance reservation modifications.
- LC2_AFFINITY
- Log messages that are related to affinity.
- LC_AFS and LC2_AFS
- Log AFS messages.
- LC_AUTH and LC2_AUTH
- Log authentication messages.
- LC_CHKPNT and LC2_CHKPNT
- Log checkpointing messages.
- LC_COMM and LC2_COMM
- Log communication messages.
- LC_DCE and LC2_DCE
- Log messages that pertain to DCE support.
- LC_EEVENTD and LC2_EEVENTD
- Log eeventd daemon messages.
- LC_ELIM and LC2_ELIM
- Log ELIM messages.
- LC_EXEC and LC2_EXEC
- Log significant steps for job execution.
- LC_FAIR
- Log fair share policy messages.
- LC_FILE and LC2_FILE
- Log file transfer messages.
- LC2_GUARANTEE
- Log messages that are related to guaranteed SLAs.
- LC_HANG and LC2_HANG
- Mark where a program might hang.
- LC_JARRAY and LC2_JARRAY
- Log job array messages.
- LC_JLIMIT and LC2_JLIMIT
- Log job slot limit messages.
- LC_LOADINDX and LC2_LOADINDX
- Log load index messages.
- LC_M_LOG and LC2_M_LOG
- Log multievent log messages.
- LC_MEMORY and LC2_MEMORY
- Log messages that are related to MEMORY allocation.
- LC_MPI and LC2_MPI
- Log MPI messages.
- LC_MULTI and LC2_MULTI
- Log messages that pertain to LSF multicluster capability.
- LC_PEND and LC2_PEND
- Log messages that are related to job pending reasons.
- LC_PERFM and LC2_PERFM
- Log performance messages.
- LC_PIM and LC2_PIM
- Log PIM messages.
- LC_PREEMPT and LC2_PREEMPT
- Log preemption policy messages.
- LC2_RC
- Log resource connector messages.
- LC_RESOURCE and LC2_RESOURCE
- Log messages that are related to resource broker.
- LC_RESREQ and LC2_RESREQ
- Log resource requirement messages.
- LC_SCHED and LC2_SCHED
- Log messages that pertain to the batch scheduler.
- LC_SIGNAL and LC2_SIGNAL
- Log messages that pertain to signals.
- LC_SYS and LC2_SYS
- Log system call messages.
- LC_TRACE and LC2_TRACE
- Log significant program walk steps.
- LC_XDR and LC2_XDR
- Log everything that is transferred by XDR.
- LC_XDRVERSION and LC2_XDRVERSION
- Log messages for XDR version.
- -l debug_level
- Specifies level of detail in debug messages. The higher the number, the more detail that is
logged. Higher levels include all lower levels.
The default is 0 (LOG_DEBUG level in parameter LSF_LOG_MASK)
The following values are supported:- 0
- LOG_DEBUG level for parameter LSF_LOG_MASK in the lsf.conf file. 0 is the default.
- 1
- LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG1 includes LOG_DEBUG levels.
- 2
- LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG2 includes LOG_DEBUG1 and LOG_DEBUG levels.
- 3
- LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.
- -f logfile_name
- Specify the name of the file into which debugging messages are to be logged. A file name with or
without a full path might be specified.
If a file name without a path is specified, the file is saved in the LSF system log directory.
The name of the file that is created has the following format:logfile_name.daemon_name.log.host_name
On UNIX and Linux, if the specified path is not valid, the log file is created in the /tmp directory.
On Windows, if the specified path is not valid, no log file is created.
By default, current LSF system log file in the LSF system log file directory is used.
- -o
- Turns off temporary debug settings and resets them to the daemon start state. The message log
level is reset back to the value of LSF_LOG_MASK and classes are reset to the
value of LSB_DEBUG_MBD, LSB_DEBUG_SBD.
The log file is also reset back to the default log file.
- host_name ...
- Optional. Sets debug settings on the specified host or hosts.
The default is the local host (the host from which command was submitted).
- sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]
- Sets the timing level for the sbatchd daemon to include extra timing
information in log files. You must be root or the LSF
administrator to use this command.
In LSF multicluster capability, timing levels can be set only for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.
If the command is used without any options, the following default values are used:- timing_level=no
- Timing information is recorded.
- logfile_name=current
- LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.
- host_name=local
- The host from which command was submitted.
- -l timing_level
- Specifies detail of timing information that is included in log files. Timing messages indicate
the execution time of functions in the software and are logged in milliseconds.
The following values are supported: 1|2|3|4|5
The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.
By default, no timing information is logged.
- -f logfile_name
- Specify the name of the file into which timing messages are to be logged. A file name with or
without a full path can be specified.
If a file name without a path is specified, the file is saved in the LSF system log file directory.
The name of the file that is created has the following format:logfile_name.daemon_name.log.host_name
On UNIX and Linux, if the specified path is not valid, the log file is created in the /tmp directory.
On Windows, if the specified path is not valid, no log file is created.
Note: Both timing and debug messages are logged in the same files.The default is the current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.
- -o
- Optional. Turn off temporary timing settings and reset them to the daemon start state. The
timing level is reset back to the value of the parameter for the corresponding daemon
(LSB_TIME_MBD, LSB_TIME_SBD).
The log file is also reset back to the default log file.
- host_name ...
- Sets the timing level on the specified host or hosts.
The default is the local host from which command was submitted).
- schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
- Sets message log level for the mbschd demon to include additional information
in log files. You must be root or the LSF
administrator to use this command.
- -s log_queue_size
- Specifies the maximum number of entries in the logging queue that is used by the
mbschd daemon logging thread. Specify an integer 100 - 500000. The logging queue
contains the messages to be written to the log files.
This option is ignored if the LSF_LOG_THREAD=N parameter is defined in the lsf.conf file.
See the sbddebug subcommand for an explanation of the other options.
- schdtime [-l timing_level] [-f] [-o]
- Sets timing level for the mbschd daemon to include extra timing information
in log files. You must be root or the LSF
administrator to use this command.
See the description of the sbdtime subcommand for an explanation of options.
- security view [-v]
- Shows the configuration of the components for the LSF
security mechanism.
- -v
- Verbose mode. Displays a detailed description of the current configuration of the LSF security components. Also displays the optimal configuration if the current setting is not secure.
- showconf mbd | [sbd [host_name ... | all] | gpd]
- Display all configured parameters and their values set in the
lsf.conf or ego.conf file that affect the
mbatchd
,sbatchd
, andgpolicyd
daemons.In LSF multicluster capability, the badmin showconf command displays only the parameters of daemons on the local cluster.
Running the badmin showconf command from a management candidate host reaches all server hosts in the cluster. Running the badmin showconf command from a server-only host might not be able to reach other server-only hosts.
The badmin showconf command displays only the values that are used by LSF.
The badmin showconf command displays the value of the EGO_MASTER_LIST parameter from wherever it is defined. You can define either the LSF_MASTER_LIST parameter or the EGO_MASTER_LIST parameter in the lsf.conf file. If EGO is enabled in the LSF cluster, LIM reads the lsf.conf file first, then the ego.conf file. The value of the LSF_MASTER_LIST parameter is displayed only if the EGO_MASTER_LIST parameter is not defined at all in the ego.conf file.
For example, if you define the LSF_MASTER_LIST parameter in the lsf.conffile, and the EGO_MASTER_LIST parameter in the ego.conf file, the badmin showconf command displays the value of the EGO_MASTER_LIST parameter.
If EGO is enabled in the LSF cluster, and you define the LSF_MASTER_LIST parameter in the lsf.conf file, and the EGO_MASTER_LIST parameter in the ego.conf file, the badmin showconf command displays the value of the EGO_MASTER_LIST parameter in the ego.conf file.
If EGO is disabled, the ego.conf file is not loaded, so parameters that are defined in the lsf.conf file are displayed.
- showstatus
- Displays current LSF runtime information about the whole cluster, including information about hosts, jobs, users, user groups, simulation-based estimation, and mbatchd daemon startup and reconfiguration.
See also
bhosts, bqueues, lsb.hosts, lsb.params, lsb.queues, lsf.cluster, lsf.conf, sbatchd, mbatchd, mbschd