badmin

The badmin command is the administrative tool for LSF.

Synopsis

badmin subcommand options
badmin [-h | -V]

Description

The badmin command provides a set of subcommands to control and monitor LSF. If you do not include subcommands, the badmin command prompts for subcommands from the standard input.

Information about each subcommand is available through the -h option.

The badmin subcommands include privileged and non-privileged subcommands. Only root or LSF administrators can run privileged subcommands. The following subcommands are privileged:

  • diagnose
  • gpddebug
  • gpdrestart
  • gpdtime
  • hclose
  • hghostadd
  • hghostdel
  • hopen
  • hpower
  • mbddebug
  • mbdrestart
  • perflog
  • perfmon
  • qact
  • qclose
  • qinact
  • qopen
  • rc
  • reconfig
  • security

The configuration file lsf.sudoers must be set to use the privileged command hstartup by a non-root user.

All other commands are non-privileged commands and can be used by any LSF user. If the LSF_AUTH parameter is not defined in the lsf.conf file, privileged ports are used and the badmin command must be installed because it needs to send the request through a privileged port. The badmin executable file is installed with the setuid flag turned on.

When you use subcommands for which multiple host names can be specified, do not enclose the host names in quotation marks.

Subcommand synopsis

ckconfig [-v]
diagnose pending_jobID ...
diagnose -c jobreq [-f logfile_name] [-t xml | -t json]
diagnose -c query [[-f logfile_name] [-d duration] | [-o]]
gpdckconfig [-v]
gpddebug [-c class_name] [-l debug_level] [-f logfile_name] [-o]
gpdrestart [-v] [-f]
gpdtime [-l timing_level] [-f logfile_name] [-o]
hclose [-C comment] [-i "lock_id"] [host_name ... | host_group ... | compute_unit ... | all]
help [command ...] | ? [command ...]
hghostadd [-C comment] host_group | compute_unit | host_name [host_name ...]
hghostdel [-f] [-C comment] host_group | compute_unit | host_name [host_name ...]
hhist [-t time0,time1] [-f logfile_name] [host_name ...]
hist [-t time0,time1] [-f logfile_name]
hopen [-C comment] [-i "lock_id ... | all"] [host_name ... | host_group ... | compute_unit ... | all]
hpower [suspend | resume] [-C comment] [host_name ...]
mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
mbdhist [-t time0,time1] [-f logfile_name]
mbdrestart [-C comment] [-v] [-f] [-p | -s]
mbdtime [-l timing_level] [-f logfile_name] [-o]
perflog [-t sample_period] [-d duration] [-f logfile_name] [-o]
perfmon start [sample_period]| stop | view | setperiod sample_period
qact [-C comment] [queue_name ... | all]
qclose [-C comment] [queue_name ... | all]
qhist [-t time0,time1] [-f logfile_name] [queue_name ...]
qinact [-C comment] [queue_name ... | all]
qopen [-C comment] [queue_name ... | all]
quit
rc error [-t daysd | hoursh | minutesm] [-p " provider ..."]
rc view [-c "instances | policies | templates ..."] [-p " provider ..."]
reconfig [-v] [-f]
sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]
sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]
schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
schdtime [-l timing_level] [-f logfile_name] [-o]
security view [-v]
showconf mbd | [sbd [host_name ... | all] | gpd]
showstatus
-h
-V

Options

subcommand
Runs the specified subcommand. See the Usage section.
-h
Prints command usage to stderr and exits.
-V
Prints LSF release version to stderr and exits.

Usage

ckconfig [-v]
Checks LSF configuration files that are located in the LSB_CONFDIR/cluster_name/configdir directory, and checks the LSF_ENVDIR/lsf.licensescheduler file.

The LSB_CONFDIR variable is defined in the lsf.conf file, in LSF_ENVDIR or /etc (if LSF_ENVDIR is not defined).

By default, the badmin ckconfig command displays only the result of the configuration file check. If warning errors are found, the badmin command prompts you to display detailed messages.
-v
Verbose mode. Displays detailed messages about configuration file checking to stderr.
diagnose <pend jobid> ...
Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in the lsb.params file.
badmin diagnose 1057
diagnose -c jobreq [-f snapshot_file_name] [-t xml | -t json]
UNIX only. Saves the current contents of the scheduler job bucket information into an XML or JSON snapshot file as raw data.

Jobs are put into scheduling buckets based on resource requirements and different scheduling policies. Saving the contents into a snapshot file is useful for data analysis by parsing the file or by running a simple text search on its contents.

This feature is helpful if you want to examine a sudden large performance impact on the scheduler. Use the snapshot file to identify any users with many buckets or large attribute values.

You can use the following options:

-c jobreq
Required.
-f file_name
Specifies a snapshot file in which to save the information. It is either a file name, which is located in the DIAGNOSE_LOGDIR directory, or a full path file name. If the specified snapshot file exists, it is overwritten with the current information.

The default name for the snapshot file is jobreq_<hostname>_<dateandtime>.<format>, where <format> is xml or json, depending on the specified format of the snapshot file.

The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as the mbatchd daemon log permissions. Everyone has read and execute access but the LSF_ADMIN owner has write, read, and execute access.

-t xml | -t json
Specifies the format of the snapshot file. Specify -t xml for the snapshot file to be in XML format, or specify -t json for the snapshot file to be in JSON format.

The default format for the snapshot file is XML, and the extension of the snapshot file is .xml. If the snapshot file is in JSON format, the extension of the snapshot file is .json.

diagnose -c query [-f logfile_name] [-d minutes] | [-o]]
This feature is helpful if an unexpected mbatchd query load causes the cluster to slow or fail to respond to requests. For example, many bjobs command queries might cause a high network load and prevent the mbatchd daemon from responding. Running this command with its options enables the mbatchd daemon to dump the query source information into a log file.

The log file shows information about the source of queries for easier troubleshooting. The log file shows who made these requests, where the requests came from, and the data size of the query.

You can also configure this feature by enabling the DIAGNOSE_LOGDIR and ENABLE_DIAGNOSE parameters in the lsb.params file to log the entire query information as soon as the cluster starts. However, the dynamic settings from the command override the static parameter settings. Also, after the duration you specify to track the query information expires, the static diagnosis settings take effect.

You can use the following options to dynamically set the time, specify a log file, and allow the mbatchd daemon to collect information:

-c query
Required.
-f
Specifies a log file in which to save the information. It is either a file name, which is located in the DIAGNOSE_LOGDIR directory, or a full path file name.

The default name for the log file is query_info.querylog.<host_name>.

The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as mbatchd daemon log permissions. Everyone has read and execute access but the LSF_ADMIN user has write, read, and execute access.

If you specify the log file in the lsb.params file and then later specify a different log file in the command line, the one in the command line takes precedence. Logging continues until the specified duration is over, or until you stop dynamic logging. It then switches back to the static log file location.

-d minutes
The duration in minutes you specify to track the query information. The mbatchd daemon reverts to static settings after the duration is over, or until you stop it manually, restart (with the badmin mbdrestart command), or reconfigure (with badmin reconfig command). The default value for this duration is infinite. By default, query information is always logged.
-o
Turns off dynamic diagnosis (stop logging). If the ENABLE_DIAGNOSE=query parameter is configured, it returns to the static configuration.
gpdckconfig [-v]
Checks the global policy configuration file lsb.globalpolicies located in the LSB_CONFDIR/cluster_name/configdir directory.

The LSB_CONFDIR variable is defined in the lsf.conf file, in LSF_ENVDIR or /etc (if LSF_ENVDIR is not defined).

By default, the badmin gpdckconfig command displays only the result of the configuration file check. If warning errors are found, the badmin command prompts you to display detailed messages.

You can run the badmin gpdckconfig command only on the management host or management candidate hosts in the Global Policy Daemon Cluster (GPD Cluster).

-v

Verbose mode. Displays detailed messages about configuration file checking to stderr.

gpddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]
Sets the message log level for the gpolicyd daemon to include additional information in log files. You must be root or the LSF administrator to use this command.
If the command is used without any options, the following default values are used:
class_name
Not defined (no additional classes are logged).
debug_level=0
As specified but the LOG_DEBUG level in the LSF_LOG_MASK parameter.
logfile_name
Not defined (LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name).
-c class_name ...
Specifies software classes for which debug messages are to be logged.

By default, class_name is not defined and no additional classes are logged.

The format of class_name is the name of a class, or a list of class names separated by spaces and enclosed in quotation marks. Classes are also listed in the lsf.h header file.

The following log classes are supported:
LC_AUTH
Log authentication messages.
LC_COMM
Log communication messages.
LC_SYS
Log system call messages.
LC_TRACE
Log significant program walk steps.
LC_XDR
Log everything that is transferred by XDR.
LC_XDRVERSION
Log messages for XDR version.
LC2_G_FAIR
Log global fairshare messages.
-l debug_level
Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.
debug_level has the following values:
Default: 0
LOG_DEBUG level in parameter LSF_LOG_MASK.
0
LOG_DEBUG level for parameter LSF_LOG_MASK in the lsf.conf file.
1
LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG1 level includes the LOG_DEBUG level.
2
LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG2 level includes LOG_DEBUG1 and LOG_DEBUG levels.
3
LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, the LOG_DEBUG3 level includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.
-f logfile_name
Specifies the name of the file into which debugging messages are to be logged. A file name with or without a full path can be specified.

If a file name without a path is specified, the file is saved in the LSF system log directory.

The name of the file that is created has the following format:

logfile_name.gpolicyd.log.host_name

On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

By default, logfile_name is the current LSF system log file in the LSF system log file directory.

-o
Turns off temporary debug settings and resets them to the daemon start state. The message log level is reset back to the value of LSF_LOG_MASK and classes are reset to the value of LSB_DEBUG_GPD.

The log file is also reset back to the default log file.

gpdrestart [-v] [-f]
Dynamically reconfigures LSF global policies and restarts the gpolicyd daemon.

The global policy configuration file lsb.globalpolicies is checked for errors and the results are printed to stderr. If no errors are found, the lsb.globalpolicies file is reloaded and the gpolicyd daemon is restarted.

If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, the gpolicyd daemon is not restarted, and the badmin command exits.

You can run the badmin gpdrestart command only on the management host or management candidate hosts in the Global Policy Daemon Cluster (GPD Cluster).

-v
Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to stderr.
-f
Disables interaction and proceeds with the gpolicyd daemon restart if configuration files contain no unrecoverable errors.
gpdtime [-l timing_level] [-f logfile_name] [-o]
Sets the timing level for the gpolicyd daemon to include extra timing information in log files. You must be root or the LSF administrator to use this command.
If the command is used without any options, the following default values are used:
timing_level
Not defined (timing information is recorded).
logfile_name
Not defined (current LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name).
-l timing_level
Specifies the detail of timing information that is included in log files. Timing messages indicate the execution time of functions in the software and are logged in milliseconds.

The following values are supported: 1|2|3|4|5

The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.

By default no timing information is logged.

-f logfile_name
Specify the name of the file into which timing messages are to be logged. A file name with or without a full path can be specified.

If a file name without a path is specified, the file is saved in the LSF system log file directory.

The name of the file that is created has the following format:
logfile_name.gpolicyd.log.host_name

On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

Note: Both timing and debug messages are logged in the same files.

The default is the current LSF system log file in the LSF system log file directory, in the format gpolicyd.log.host_name.

-o
Optional. Turns off temporary timing settings and resets them to the daemon start state. The timing level is reset back to the value of the parameter for the corresponding daemon (LSB_TIME_GPD).

The log file is also reset back to the default log file.

hclose [-C comment] [-i "lock_id"] [host_name ... | host_group ... | compute_unit ... | all]
Closes batch server hosts. Specify the names of any server hosts, host groups, or compute units. All batch server hosts are closed if the reserved word all is specified. If no argument is specified, the local host is assumed. A closed host does not accept any new jobs, but jobs that are already dispatched to the host are not affected. This behavior is different from a host closed by a window; all jobs on a host are suspended when a time window closes on the host.

If the host is already closed, this command option has no effect unless you specify the -i option to attach a lock ID to the host.

-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.

If you close a host group or compute unit, each member is displayed with the same comment string.

You cannot use the badmin hopen command to open a host that was borrowed through LSF resource connector that is in closed_RC status.

-i "lock_id"
Closes the host and attaches the specified lock ID to the closed host. Each lock ID is a string that can contain up to 128 alphanumeric and underscore (_) characters. The keyword all is reserved and cannot be used as the lock ID. A closed host can have multiple lock IDs, and the host remains closed until there are no more lock IDs attached to the host.

Use -i together with the -C option to attach an administrator message to the lock ID.

If you try to attach a lock ID that is already attached to the host (even with a different comment), the command fails for that host.

Use the badmin hopen -i command option to remove one or more lock IDs from a host.

This allows multiple users to keep a host closed for different reasons. For example, userA might be updating an application while userB is configuring the operating system. The host remains closed until both users complete their updates and open the host using their specific lock IDs.

help [command ...] | ? [command ...]
Displays the syntax and functions of the specified commands.
hghostadd [-C comment] host_group | compute_unit |host_name [host_name ...]
If dynamic host configuration is enabled, dynamically adds hosts to a host group or compute unit. After the mbatchd daemon receives the host information from the LIM on the management host, it dynamically adds the host without triggering reconfiguration.

After the host is added to the host group or compute unit, it is considered part of that group for scheduling decisions for newly submitted jobs and for existing pending jobs.

This command fails if any of the specified host groups, compute units, or host names are not valid.

Restriction: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.params file, you cannot use the hghostadd subcommand because all host allocation is under control of enterprise grid orchestrator (EGO).
-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
hghostdel [-f] [-C comment] host_group | compute_unit |host_name [host_name ...]
Dynamically deletes hosts from a host group or compute unit by triggering reconfiguration of the mbatchd daemon.

This command fails if any of the specified host groups, compute units, or host names are not valid.

CAUTION:

To change a dynamic host to a static host, first use the command badmin hghostdel to remove the dynamic host from any host group or compute unit that it belongs to. Then, configure the host as a static host in the lsf.cluster.cluster_name file.

Restriction: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.params file, you cannot use the hghostdel subcommand because all host allocation is under control of enterprise grid orchestrator (EGO).
-f
Disables interaction and does not ask for confirmation when reconfiguring mbatchd.
-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
hhist [-t time0,time1] [-f logfile_name] [host_name ...]
Displays historical events for specified hosts, or for all hosts if no host is specified. Host events are host opening and closing. Also, both badmin command and policy- or job triggered power-related events (suspend, resume, reset) are displayed.
-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all host events in the event log file.

-f logfile_name
Specify the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the host control commands hclose or hopen, hhist displays the comment text.

hist [-t time0,time1] [-f logfile_name]
Displays historical events for all the queues, hosts, and mbatchd. Both badmin command and policy- or job-triggered power-related events (suspend, resume, reset) are displayed.
-t time0,time1
Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all queue events in the event log file.
-f logfile_name
Specify the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events file. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the queue, host, and mbatchd daemon commands, the hist option displays the comment text.

hopen [-C comment] [-i "lock_id ... | all"] [host_name ... | host_group] [host_name ... | host_group ... | compute_unit ... | all]
Opens batch server hosts. Specify the names of any server hosts, host groups, or compute units. All batch server hosts are opened if the reserved word all is specified. If no host, host group, or compute unit is specified, the local host is assumed. A host accepts batch jobs if it is open.
Important: If EGO-enabled SLA scheduling is configured through the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.paramsfile, and a host is closed by EGO, it cannot be reopened by the badmin hopen command. Hosts closed by EGO have status closed_EGO in the bhosts -l command output.

-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.

If you open a host group or compute unit, each member is displayed with the same comment string.

-i "lock_id ... | all"
Removes the specified lock IDs from the closed host. Also opens the host if there are no more lock IDs remaining on the host.

Use a space to separate multiple lock IDs. Use the all keyword to remove all lock IDs and to open the host.

This allows multiple users to keep a host closed for different reasons. For example, userA might be updating an application while userB is configuring the operating system. The host remains closed until both users complete their tasks and open the host using their specific lock IDs.

hpower [suspend | resume] [-C comment] [hostname...]
Manually switches hosts between a power-saving state or a working state.
suspend | resume
The state that you want to switch the host to.
-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
hrestart
This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld restart sbd command instead to restart the sbatchd daemon.
hshutdown
This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld stop sbd command instead to shut down the sbatchd daemon.
hstartup
This subcommand is obsolete in LSF Version 10.1 Fix Pack 11. Use the bctrld start sbd command instead to start the sbatchd daemon.
mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
Sets message log level for the mbatchd daemon to include additional information in log files. You must be root or the LSF administrator to use this command.
-s log_queue_size
Specifies the maximum number of entries in the logging queue that the mbatchd logging thread uses. Specify an integer 100 - 500000. This value temporarily overrides the value of the LSF_LOG_QUEUE_SIZE parameter in the lsf.conf file. The logging queue contains the messages to be written to the log files.

If the LSF_LOG_THREAD=N parameter is defined in the lsf.conf file, the -s option is ignored.

See the sbddebug subcommand for an explanation of the other options.

For the -c option, the mdbdebug subcommand has the following valid log classes in addition to the valid log classes for the sbddebug subcommand:
LC2_EST
Log messages for the simulation-based estimator. You cannot use the mbddebug subcommand to change this log class.
LC2_G_FAIR
Log messages for global fairshare.
mbdhist [-t time0,time1] [-f logfile_name]
Displays historical events for the mbatchd daemon. Events describe the starting and exiting of the mbatchd daemon.
-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all queue events in the event log file.

-f logfile_name
Specify the file name of the event log file. Specify either an absolute or a relative path name. The default is to use the current event log file that is in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the mbdrestart subcommand, the mbdhist subcommand displays the comment text.

mbdrestart [-C comment] [-v] [-f] [-p | -s]
Dynamically reconfigures LSF and restarts the mbatchd and mbschd daemons. When live configuration with the bconf command is enabled (the LSF_LIVE_CONFDIR parameter is defined in the lsf.conffile), the badmin mbdrestart command uses the configuration files that are generated by the bconf command.

Configuration files are checked for errors and the results are printed to stderr. If no errors are found, configuration files are reloaded, the mbatchd and mbschd daemons are restarted, and events in the lsb.events file are replayed to recover the running state of the last mbatchd daemon. While the mbatchd daemon restarts, it is unavailable to service requests.

If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, the mbatchd and mbschd daemons do not restart, and the badmin command exits.

Important: If the lsb.events file is large, or many jobs are running, restarting the mbatchd daemon can take several minutes. If you need to reload only the configuration files, use the badmin reconfig command.
-C comment
Logs the text of comment as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
-v
Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to stderr.
-f
Disables interaction and forces reconfiguration and mbatchd daemon restart to proceed if configuration files contain no unrecoverable errors.
-p
Allows parallel mbatchd daemon restarts. Restart forks a child mbatchd daemon process to help minimize downtime for LSF. LSF starts a new or child mbatchd daemon process to read the configuration files and replay the event file. The old mbatchd daemon can respond to client commands, handle job scheduling and status updates, dispatching, and updating new events to event files. When restart is complete, the child takes over as mbatchd daemon, and the old mbatchd daemon dies.

This option is the default behavior for mbatchd daemon restarts. Use the -s option to use serial mbatchd daemon restarts.

-s
Allows serial mbatchd daemon restarts. Use this option to change the default mbatchd deamon behavior, which is to restart in parallel.
mbdtime [-l timing_level] [-f logfile_name] [-o]
Sets timing level for the mbatchd daemon to include extra timing information in log files. You must be root or the LSF administrator to use this command.
perflog [-t sample_period] [-f logfile_name] [-d duration] | [-o]]
This feature is useful for troubleshooting large clusters where a cluster might not be responding due to mbatchd daemon performance problems. In such cases, the mbatchd daemon performance might be slow in handling high volume request, such as job submission, job status requests, and job rusage requests.
-t
Specifies the sampling period in minutes for performance metric collection. The default value is 5 minutes.
-f
Specifies a log file in which to save the information. It is either a file name or a full path file name. If you do not specify the path for the log file, then its default path is used. The default name for the log file is mbatchd.perflog.<host_name>.

The owner of the log file is the user who is specified in the LSF_ADMIN parameter. The log file permissions are the same as mbatchd daemon log permissions. Everyone has read and execute access, but the LSF_ADMIN user has write, read and execute access.

-d
The duration in minutes to keep logging performance metric data. The mbatchd daemon does not log messages after the duration expires, or until you stop it manually, restart the mbatchd daemon, or reconfigure with the reconfig mbatchd command. The default value for the duration is infinite. By default, performance metric data is always logged).
-o
Turns off dynamic performance metric logging (stop logging). If the LSB_ENABLE_PERF_METRICS_LOG parameter is enabled, logging returns to the static configuration.
perfmon start [sample_period] | setperiod sample_period | stop | view [-json]
Dynamically enables and controls scheduler performance metric collection.

Collecting and recording performance metric data might affect the performance of LSF. Smaller sampling periods can cause the lsb.streams file to grow faster.

The following metrics are collected and recorded in each sample period:
  • The number of queries that are handled by mbatchd
  • The number of queries for each of jobs, queues, and hosts. (bjobs, bqueues, and bhosts commands, and other daemon requests)
  • The number of jobs submitted (divided into job submission requests and jobs submitted)
  • The number of jobs dispatched
  • The number of jobs reordered; that is, the number of jobs that reused the resource allocation of a finished job (the RELAX_JOB_DISPATCH_ORDER parameter in the lsb.params or lsb.queues file)
  • The number of jobs completed
  • The numbers of jobs that are sent to remote cluster
  • The numbers of jobs that are accepted by from cluster
  • The file descriptors that are used by the mbatchd daemon
  • The following scheduler performance metrics are collected:
    • A shorter scheduling interval means that the job is processed more quickly
    • Number of different resource requirement patterns for jobs in use, which might lead to different candidate host groups. The more matching hosts that are required, the longer it takes to find them, which means a longer scheduling session.
    • Number of buckets (groups) in which jobs are put based on resource requirements and different scheduling policies. More buckets means a longer scheduling session.
start [sample_period]
Start performance metric collection dynamically and specify an optional sampling period in seconds for performance metric collection.

If no sampling period is specified, the default period set in the SCHED_METRIC_SAMPLE_PERIOD parameter in the lsb.params file is used.

stop
Stop performance metric collection dynamically.
view
Display performance metric information for the current sampling period.
When used with the [-json] option it will display the output in json format. This allows for better parsing and translation into graph format.
setperiod sample_period
Set a new sampling period in seconds.
qact [-C comment] [queue_name ... | all]
Activates a deactivated queue so that submitted jobs are dispatched from the queue. If the reserved word all is specified, the qact subcommand activates all queues. If no queue name is specified, the system default queue is activated. Jobs in a queue can be dispatched only if the queue is activated.
A queue that is inactivated by its run windows cannot be reactivated by this command.
-C comment
Logs the text of the comment as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
qclose [-C comment] [queue_name ... | all]
Closes a queue to prevent jobs from being submitted to the queue. If the reserved word all is specified, the qclose subcommand closes all queues. If no queue name is specified, the system default queue is closed. A queue does not accept submitted LSF jobs if it is closed.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.
qhist [-t time0,time1] [-f logfile_name] [queue_name ...]
Displays historical events for specified queues, or for all queues if no queue is specified. Queue events are queue opening, closing, activating, and inactivating.
-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See the bhist command for the time format. The default is to display all queue events in the event log file.

-f logfile_name

Specifies the file name of the event log file. Either an absolute or a relative path name can be specified. The default is to use the current event log file in the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the queue control subcommands qclose, qopen, qact, and qinact, the qhist subcommand displays the comment text.

qinact [-C comment] [queue_name ... | all]
Deactivates a queue to stop submitted jobs from being dispatched from the queue. If the reserved word all is specified, all queues are deactivated. If no queue name is specified, the system default queue is deactivated. Jobs in a queue cannot be dispatched if the queue is inactivated.
-C comment
Logs the text as an administrator comment record to the lsb.events file. The maximum length of the comment string is 512 characters.
qopen [-C comment] [queue_name ... | all]
Opens a closed queue so users can submit jobs to it. If the reserved word all is specified, the qopen subcommand opens all queues. If no queue name is specified, the system default queue is opened. A queue accepts submitted LSF jobs only if it is open.
-C comment
Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.
quit
Exits the badmin command session.
rc error [-t <days>d | <hours>h | <minutes>m] [-p "provider ..."]
Shows LSF resource connector error messages from the host providers. These errors are provided by the third-party mosquitto message queue application, which must be running on the host.
-t <days>d | <hours>h | <minutes>m
Specifies the earliest time from which to retrieve the error messages.
Note: When specifying days, badmin retrieves messages from this time at midnight. For example, when running badmin rc error -t 1d, badmin retrieves messages from today at midnight, and when running badmin rc error -t 2d, badmin retrieves messages from yesterday at midnight.
-p "provider ...
Specifies the host providers from which to retrieve the error messages. Use a space to separate multiple host providers.
rc view [-c "instances | policies | templates ..."] [-p "provider ..."]
Shows LSF resource connector information from the host providers.
-c "instances | policies | templates ..."
Specifies whether to view information on instances, policies, or templates. Use a space to separate multiple types of information. By default, this command shows information on instances only. If policies is selected with the -c option, the -p option is ignored because all policies are displayed, not just for the specified providers.
-p "provider ..."
Specifies the host providers from which to view information. Use a space to separate multiple host providers. If policies is selected with the -c option, the -p option is ignored because all policies are displayed, not just for the specified providers.
reconfig [-v] [-f]
Dynamically reconfigures LSF.

Configuration files are checked for errors and the results are displayed to stderr. If no errors are found in the configuration files, a reconfiguration request is sent to the mbatchd daemon and configuration files are reloaded. When live configuration with the bconf command is enabled (the LSF_LIVE_CONFDIR parameter is defined in the lsf.conffile), the badmin reconfig command uses the configuration files that are generated by the bconf command.

Important: The reconfig subcommand does not restart the mbatchd daemon and does not replay the lsb.events file. To restart the mbatchd daemon and replay the lsb.events file, use the badmin mbdrestart command.

When you use this command, the mbatchd daemon is available to service requests while reconfiguration files are reloaded. Configuration changes made since system boot or the last reconfiguration take effect.

If warning errors are found, the badmin command prompts you to display detailed messages. If unrecoverable errors are found, reconfiguration fails, and the badmin command exits.

If you add a host to a queue or to a host group or compute unit, the new host is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must use the command badmin mbdrestart.

Resource requirements that are determined by the queue no longer apply to a running job after you use the badmin reconfig command. For example, if you change the RES_REQ parameter in a queue and reconfigure the cluster, the previous queue-level resource requirements for running jobs are lost.

-v
Verbose mode. Displays detailed messages about the status of the configuration files. Without this option, the default is to display the results of configuration file checking. All messages from the configuration file check are printed to stderr.
-f
Disables interaction and proceeds with reconfiguration if configuration files contain no unrecoverable errors.
sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]
Sets the message log level for the sbatchd daemon to include additional information in log files. You must be root or the LSF administrator to use this command.

In LSF multicluster capability, debug levels can be set only for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.

If the command is used without any options, the following default values are used:
class_name=0
No additional classes are logged.
debug_level=0
LOG_DEBUG level in parameter LSF_LOG_MASK.
logfile_namedaemon_name.log.host_name
LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.
host_name=local_host
Host from which the command was submitted.
-c class_name ...
Specify software classes for which debug messages are to be logged.
Note: Classes are also listed in the lsf.h header file.

By default, no additional classes are logged (class name 0).

The following log classes are supported:
LC_ADVRSV and LC2_ADVRSV
Log advance reservation modifications.
LC2_AFFINITY
Log messages that are related to affinity.
LC_AFS and LC2_AFS
Log AFS messages.
LC_AUTH and LC2_AUTH
Log authentication messages.
LC_CHKPNT and LC2_CHKPNT
Log checkpointing messages.
LC_COMM and LC2_COMM
Log communication messages.
LC_DCE and LC2_DCE
Log messages that pertain to DCE support.
LC_EEVENTD and LC2_EEVENTD
Log eeventd daemon messages.
LC_ELIM and LC2_ELIM
Log ELIM messages.
LC_EXEC and LC2_EXEC
Log significant steps for job execution.
LC_FAIR
Log fairshare policy messages.
LC_FILE and LC2_FILE
Log file transfer messages.
LC2_GUARANTEE
Log messages that are related to guaranteed SLAs.
LC_HANG and LC2_HANG
Mark where a program might hang.
LC_JARRAY and LC2_JARRAY
Log job array messages.
LC_JLIMIT and LC2_JLIMIT
Log job slot limit messages.
LC_LOADINDX and LC2_LOADINDX
Log load index messages.
LC_M_LOG and LC2_M_LOG
Log multievent log messages.
LC_MEMORY and LC2_MEMORY
Log messages that are related to MEMORY allocation.
LC_MPI and LC2_MPI
Log MPI messages.
LC_MULTI and LC2_MULTI
Log messages that pertain to LSF multicluster capability.
LC_PEND and LC2_PEND
Log messages that are related to job pending reasons.
LC_PERFM and LC2_PERFM
Log performance messages.
LC_PIM and LC2_PIM
Log PIM messages.
LC_PREEMPT and LC2_PREEMPT
Log preemption policy messages.
LC2_RC
Log resource connector messages.
LC_RESOURCE and LC2_RESOURCE
Log messages that are related to resource broker.
LC_RESREQ and LC2_RESREQ
Log resource requirement messages.
LC_SCHED and LC2_SCHED
Log messages that pertain to the batch scheduler.
LC_SIGNAL and LC2_SIGNAL
Log messages that pertain to signals.
LC_SYS and LC2_SYS
Log system call messages.
LC_TRACE and LC2_TRACE
Log significant program walk steps.
LC_XDR and LC2_XDR
Log everything that is transferred by XDR.
LC_XDRVERSION and LC2_XDRVERSION
Log messages for XDR version.
-l debug_level
Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.

The default is 0 (LOG_DEBUG level in parameter LSF_LOG_MASK)

The following values are supported:
0
LOG_DEBUG level for parameter LSF_LOG_MASK in the lsf.conf file. 0 is the default.
1
LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG1 includes LOG_DEBUG levels.
2
LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG2 includes LOG_DEBUG1 and LOG_DEBUG levels.
3
LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.
-f logfile_name
Specify the name of the file into which debugging messages are to be logged. A file name with or without a full path might be specified.

If a file name without a path is specified, the file is saved in the LSF system log directory.

The name of the file that is created has the following format:
logfile_name.daemon_name.log.host_name

On UNIX and Linux, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

By default, current LSF system log file in the LSF system log file directory is used.

-o
Turns off temporary debug settings and resets them to the daemon start state. The message log level is reset back to the value of LSF_LOG_MASK and classes are reset to the value of LSB_DEBUG_MBD, LSB_DEBUG_SBD.

The log file is also reset back to the default log file.

host_name ...
Optional. Sets debug settings on the specified host or hosts.

The default is the local host (the host from which command was submitted).

sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]
Sets the timing level for the sbatchd daemon to include extra timing information in log files. You must be root or the LSF administrator to use this command.

In LSF multicluster capability, timing levels can be set only for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.

If the command is used without any options, the following default values are used:
timing_level=no
Timing information is recorded.
logfile_name=current
LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.
host_name=local
The host from which command was submitted.
-l timing_level
Specifies detail of timing information that is included in log files. Timing messages indicate the execution time of functions in the software and are logged in milliseconds.

The following values are supported: 1|2|3|4|5

The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.

By default, no timing information is logged.

-f logfile_name
Specify the name of the file into which timing messages are to be logged. A file name with or without a full path can be specified.

If a file name without a path is specified, the file is saved in the LSF system log file directory.

The name of the file that is created has the following format:
logfile_name.daemon_name.log.host_name

On UNIX and Linux, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

Note: Both timing and debug messages are logged in the same files.

The default is the current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.

-o
Optional. Turn off temporary timing settings and reset them to the daemon start state. The timing level is reset back to the value of the parameter for the corresponding daemon (LSB_TIME_MBD, LSB_TIME_SBD).

The log file is also reset back to the default log file.

host_name ...
Sets the timing level on the specified host or hosts.

The default is the local host from which command was submitted).

schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [-s log_queue_size]
Sets message log level for the mbschd demon to include additional information in log files. You must be root or the LSF administrator to use this command.
-s log_queue_size
Specifies the maximum number of entries in the logging queue that is used by the mbschd daemon logging thread. Specify an integer 100 - 500000. The logging queue contains the messages to be written to the log files.

This option is ignored if the LSF_LOG_THREAD=N parameter is defined in the lsf.conf file.

See the sbddebug subcommand for an explanation of the other options.

schdtime [-l timing_level] [-f] [-o]
Sets timing level for the mbschd daemon to include extra timing information in log files. You must be root or the LSF administrator to use this command.

See the description of the sbdtime subcommand for an explanation of options.

security view [-v]
Shows the configuration of the components for the LSF security mechanism.
-v
Verbose mode. Displays a detailed description of the current configuration of the LSF security components. Also displays the optimal configuration if the current setting is not secure.
showconf mbd | [sbd [host_name ... | all] | gpd]
Display all configured parameters and their values set in the lsf.conf or ego.conf file that affect the mbatchd, sbatchd, and gpolicyd daemons.

In LSF multicluster capability, the badmin showconf command displays only the parameters of daemons on the local cluster.

Running the badmin showconf command from a management candidate host reaches all server hosts in the cluster. Running the badmin showconf command from a server-only host might not be able to reach other server-only hosts.

The badmin showconf command displays only the values that are used by LSF.

The badmin showconf command displays the value of the EGO_MASTER_LIST parameter from wherever it is defined. You can define either the LSF_MASTER_LIST parameter or the EGO_MASTER_LIST parameter in the lsf.conf file. If EGO is enabled in the LSF cluster, LIM reads the lsf.conf file first, then the ego.conf file. The value of the LSF_MASTER_LIST parameter is displayed only if the EGO_MASTER_LIST parameter is not defined at all in the ego.conf file.

For example, if you define the LSF_MASTER_LIST parameter in the lsf.conffile, and the EGO_MASTER_LIST parameter in the ego.conf file, the badmin showconf command displays the value of the EGO_MASTER_LIST parameter.

If EGO is enabled in the LSF cluster, and you define the LSF_MASTER_LIST parameter in the lsf.conf file, and the EGO_MASTER_LIST parameter in the ego.conf file, the badmin showconf command displays the value of the EGO_MASTER_LIST parameter in the ego.conf file.

If EGO is disabled, the ego.conf file is not loaded, so parameters that are defined in the lsf.conf file are displayed.

showstatus
Displays current LSF runtime information about the whole cluster, including information about hosts, jobs, users, user groups, simulation-based estimation, and mbatchd daemon startup and reconfiguration.

See also

bhosts, bqueues, lsb.hosts, lsb.params, lsb.queues, lsf.cluster, lsf.conf, sbatchd, mbatchd, mbschd