bhosts

Displays hosts and their static and dynamic resources.

Synopsis

bhosts [-w | -l | -e | -o "[field_name | all][:[-][output_width]] ... [delimiter='character']" [-json]] [-a] [-attr] [-alloc] [-cname] [-x] [-X] [-R "res_req"] [host_name ... | host_group ... | compute_unit ...]
bhosts [-w | -e | -o "[field_name | all][:[-][output_width]] ... [delimiter='character']" [-json]] [-a] [-attr] [-alloc] [-cname] [-noheader] [-x] [-X] [-R "res_req"][host_name ... | host_group ... | compute_unit ...]
bhosts [-e] [-cname] [-a] [-noheader] [-loc] [-attr]-s |-sl [resource_name ...]
bhosts [-aff] [-attr] [-l] [host_name ... | host_group ... | compute_unit ...] | [cluster_name]
bhosts [-l [-gpu] ] [host_name ... | cluster_name]
bhosts [-w ] [-rconly]
bhosts [-h | -V]

Description

By default, returns the following information about all hosts: Host name, host status, job state statistics, and job slot limits.

The bhosts command displays output for condensed host groups and compute units. These host groups and compute units are defined by CONDENSE in the HostGroup and ComputeUnit sections of the lsb.hosts file. Condensed host groups and compute units are displayed as a single entry with the name as defined by GROUP_NAME or NAME in the lsb.hosts file.

When EGO adds more resources to a running resizable job, the bhosts command displays the added resources. When EGO removes resources from a running resizable job, the bhosts command displays the updated resources.

The -l and -X options display noncondensed output.

The -s option displays information about the numeric shared resources and their associated hosts.

With LSF multicluster capability, displays the information about hosts available to the local cluster. Use the -e option to see information about exported hosts.

Options

-a
Shows information about all hosts, including hosts relinquished to a resource provider (such as EGO or OpenStack) through LSF resource connector. Default output includes only standard LSF hosts.
-aff
Displays host topology information for CPU and memory affinity scheduling.
-alloc
Shows counters for slots in RUN, SSUSP, USUSP, and RSV. The slot allocation is different depending on whether the job is an exclusive job or not.
-attr
Displays information on attributes that are attached to the host. These attributes were created with the battr create command, or automatically created according to attribute requests.
-cname
In LSF Advanced Edition, includes the cluster name for execution cluster hosts in output. The output that is displayed is sorted by cluster and then by host name.
Note: This command option is deprecated and might be removed in a future version of LSF.
-e
LSF multicluster capability only. Displays information about resources that were exported to another cluster.
-gpu [-l]
Displays GPU information on the host.

The -l option shows more detailed information about the GPUs.

-json

Displays the customized output in JSON format.

When specified, bhosts -o displays the customized output in the JSON format.

This option applies only to output for the bhosts -o command for customized output. This option has no effect when used with bhosts without the -o option and the LSB_BHOSTS_FORMAT environment variable and parameter are not defined.

-l
Displays host information in a long multi-line format. In addition to the default fields, displays information about the CPU factor, the current load, and the load thresholds. Also displays the value of slots for each host. The slots value is the greatest number of unused slots on a host.

The bhosts -l option also displays information about the dispatch windows.

When PowerPolicy is enabled in the lsb.threshold file, the bhosts -l command also displays host power states. Final power states are on or suspend. Intermediate power states are restarting, resuming, and suspending. The final power state under administrator control is closed_Power. The final power state under policy control is ok_Power. If the host status becomes unknown (power operation due to failure), the power state is shown as a dash (-).

If you specified an administrator comment with the -C option of the host control commands (badmin hclose -C or badmin hopen -C), the -l option displays the comment text. If there are any lock IDs that are attached to a closed host, these lock IDs are displayed with any attached comments in a tabular format.

If enhanced energy accounting using Elasticsearch has been enabled (with LSF_ENABLE_BEAT_SERVICE in lsf.conf), output will show the Current Power usage in watts, and total Energy Consumed in Joule and kWh.

If attributes are attached to the host, the -l option shows detailed information on these attributes.

-noheader

Removes the column headings from the output.

When specified, bhosts displays the values of the fields without displaying the names of the fields. This option is useful for script parsing, when column headings are not necessary.

This option applies to output for the bhosts command with no options, and to output for all bhosts options with output that uses column headings, including the following options: -a, -alloc, -cname, -e, -o, -R, -s, -w, -x, -X.

This option does not apply to output for bhosts options that do not use column headings, including the following options: -aff, -json, -l.

-o

Sets the customized output format.

  • Specify which bhosts fields (or aliases instead of the full field names), in which order, and with what width to display.
  • Specify only the bhosts field name or alias to set its output to unlimited width and left justification.
  • (Available starting in Fix Pack 14) Specify all to display all fields. Specify the colon (:) with an output width that applies to all fields.
  • Specify the colon (:) without a width to set the output width to the recommended width for that field.
  • Specify the colon (:) with a width to set the maximum number of characters to display for the field. When its value exceeds this width, bhosts truncates the ending characters.
  • Specify a hyphen (-) to set right justification when bhosts displays the output for the specific field. If not specified, the default is to set left justification when bhosts displays the output for a field.
  • Specify a second colon (:) with a unit to specify a unit prefix for the output for the following fields: mem, max_mem, avg_mem, memlimit, swap, swaplimit, corelimit, stacklimit, and hrusage (for hrusage, the unit prefix is for mem and swap resources only).

    This unit is KB (or K) for kilobytes, MB (or M) for megabytes, GB (or G) for gigabytes, TB (or T) for terabytes, PB (or P) for petabytes, EB (or E) for exabytes, ZB (or Z) for zettabytes), or S to automatically adjust the value to a suitable unit prefix and remove the "bytes" suffix from the unit. The default is to automatically adjust the value to a suitable unit prefix, but keep the "bytes" suffix in the unit.

    The display value keeps two decimals but rounds up the third decimal. For example, if the unit prefix is set to G, 10M displays as 0.01G.

    The unit prefix specified here overrides the value of the LSB_UNIT_FOR_JOBS_DISPLAY environment variable, which also overrides the value of the LSB_UNIT_FOR_JOBS_DISPLAY parameter in the lsf.conf file.

  • Use delimiter= to set the delimiting character to display between different headers and fields. This delimiter must be a single character. By default, the delimiter is a space.
Output customization applies only to the output for certain bhosts options:
  • LSB_BHOSTS_FORMAT and bhosts -o both apply to output for the bhosts command with no options, and for bhosts options with output that filter information, including the following options: -a, -alloc, -cname, -R, -x, -X.
  • LSB_BHOSTS_FORMAT and bhosts -o do not apply to output for bhosts options that use a modified format, including the following options: -aff, -e, -l, -s, -w.

The bhosts -o option overrides the LSB_BHOSTS_FORMAT environment variable, which overrides the LSB_BHOSTS_FORMAT setting in lsf.conf.

The following are the field names used to specify the bhosts fields to display, with valid widths and any supported aliases (which you can use instead of the field names). Units of measurement for the fields are an automatically chosen units of bytes (such as gigabytes, megabytes, and so on), depending on the field name.

Table 1. Output fields for bhosts
Field name Width Alias
host_name 20 hname
status 15 stat
cpuf 10  
jl_u 8 jlu
max 8  
njobs 8  
run 8  
ssusp 8  
ususp 8  
rsv 8  
dispatch_window 50 dispwin
ngpus 8 ng
ngpus_alloc 8 ngu
ngpus_excl_alloc 8 ngx
ngpus_shared_alloc 8 ngs
ngpus_shared_jexcl_alloc 8 ngsjx
ngpus_excl_avail 8 ngfx
ngpus_shared_avail 8 ngfs
attribute 50 attr
mig_alloc 5  
comments
Note: If combined with the bhosts -json option, this field displays full details of host closure events such as event time, administrator ID, lock ID, and comments, as shown in the bhosts -l option.
128  
available_mem

(Available starting in Fix Pack 14)

15  
reserved_mem

(Available starting in Fix Pack 14)

15  
total_mem

(Available starting in Fix Pack 14)

15  

Field names and aliases are not case-sensitive. Valid values for the output width are any positive integer 1 - 4096.

For example,

bhosts -o "host_name cpuf: jl_u:- max:-6 delimiter='^'"

This command displays the following fields:

  • HOST_NAME with unlimited width and left justified.
  • CPUF with a maximum width of ten characters (which is the recommended width) and left justified.
  • JL_U with a maximum width of eight characters (which is the recommended width) and right justified.
  • MAX with a maximum width of six characters and right justified.
  • The ^ character is displayed between different headers and fields.
-w
Displays host information in wide format. Fields are displayed without truncation.

For condensed host groups and compute units, the -w option displays the overall status and the number of hosts with the ok, unavail, unreach, and busy status in the following format:

host_group_status num_ok/num_unavail/num_unreach/num_busy

Where
  • host_group_status is the overall status of the host group or compute unit. If a single host in the group or unit is ok, the overall status is also ok.
  • num_ok, num_unavail, num_unreach, and num_busy are the number of hosts that are ok, unavail, unreach, and busy.
For example, if five hosts are ok, two unavail, one unreach, and three busy in a condensed host group hg1, the following status is displayed:
hg1 ok 5/2/1/3
If any hosts in the host group or compute unit are closed, the status for the host group is displayed as closed, with no status for the other states:
hg1 closed

The status of LSF resource connector hosts that are closed because of a resource provider reclaim request is closed_RC.

-rc [-l]
Displays the current status of hosts requested from and provisioned by LSF resource connector, as well as a brief history of each provisioned host.
Note: Requires LSF Fix Pack 4.

The -rc and -rconly options make use of the third-party mosquitto message queue application. LSF resource connector publishes additional provider host information that is displayed by these bhosts options. The mosquitto binary file is included as part of the LSF distribution.

To use the -rc option, LSF resource connector must be enabled with the LSB_RC_EXTERNAL_HOST_FLAG parameter in the lsf.conf file.

If you use the MQTT message broker that is distributed withLSF, you must configure the LSF_MQ_BROKER_HOSTS and MQTT_BROKER_HOST parameters in the lsf.conf file. The LSF_MQ_BROKER_HOSTS and MQTT_BROKER_HOST parameters must specify the same host name. The LSF_MQ_BROKER_HOSTS parameter enables LIM to start the mosquitto daemon.

If you use an existing MQTT message broker, you must configure the MQTT_BROKER_HOST parameter. You can optionally specify an MQTT broker port with the MQTT_BROKER_PORT parameter.

Use the ps command to check that the MQTT message broker daemon (mosquitto) is installed and running: ps -ef | grep mosquitto.

Configure the EBROKERD_HOST_CLEAN_DELAY to specify a delay, in minutes, after which the ebrokerd daemon removes information about relinquished or reclaimed hosts. This parameter allows the bhosts -rc and bhosts -rconly commands to get LSF resource connector provider host information for some time after they are de-provisioned.

The following additional columns are shown in the host list:
RC_STATUS
LSF resource connector status.
Preprovision_Started
Resource connector started the pre-provisioning script for the new host.
Preprovision_Failed
The pre-provisioning script returned an error.
Allocated
The host is ready to join the LSF cluster.
Reclaim_Received
A host reclaim request was received from the provider (for example, for an AWS spot instance).
RelinquishReq_Sent
LSF started to relinquish the host.
Relinquished
LSF finished relinquishing the host.
Deallocated_Sent
LSF sent a return request to the provider.
Postprovision_Started
LSF started the post-provisioning script after the host was returned.
Done
The host life cycle is complete.
PROV_STATUS
Provider status. This status depends the provider. For example, AWS has pending, running, shutting down, terminated, and others. Check documentation for the provider to understand the status that is displayed.
UPDATED_AT
Time stamp of the latest status change.
INSTANCE_ID
ID of the created machine instance. This provides a unique ID for each cloud instance of the LSF resource connector host.

For hosts provisioned by resource connector, these columns show appropriate status values and a time stamp. A dash (-) is displayed in these columns for other hosts in the cluster.

For example,
bhosts -rc
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV  RC_STATUS      PROV_STATUS    UPDATED_AT              INSTANCE_ID
ec2-35-160-173-192 ok              -      1      0      0      0      0      0  Allocated      running        2017-04-07T12:28:46CDT  i-0244f608fe7b5e014
lsf1.aws.          closed          -      1      0      0      0      0      0          -           -              -

The -l option shows more detailed information about provisioned hosts:
bhosts -rc -l
HOST  ec2-35-160-173-192.us-west-2.compute.amazonaws.com
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV RC_STATUS      PROV_STATUS    UPDATED_AT             INSTANCE_ID              DISPATCH_WINDOW
ok              60.00     -      1      0      0      0      0      0 Allocated      running        2017-04-07T12:28:46CDT i-0244f608fe7b5e014      -

 CURRENT LOAD USED FOR SCHEDULING:
                r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots
 Total           1.0   0.0   0.0    1%   0.0    33    0     3 5504M    0M  385M      1
 Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      -
-rconly
Shows the status of all hosts provisioned by LSF resource connector, no matter if they have joined the cluster or not.
Note: Requires LSF Fix Pack 4.

To use the -rconly option, LSF resource connector must be enabled with the LSB_RC_EXTERNAL_HOST_FLAG parameter in the lsf.conf file. If you use the MQTT message broker that is distributed withLSF, you must configure the LSF_MQ_BROKER_HOSTS and MQTT_BROKER_HOST parameters in the lsf.conf file. The LSF_MQ_BROKER_HOSTS and MQTT_BROKER_HOST parameters must specify the same host name. The LSF_MQ_BROKER_HOSTS parameter enables LIM to start the mosquitto daemon.

If you use an existing MQTT message broker, you must configure the MQTT_BROKER_HOST parameter. You can optionally specify an MQTT broker port with the MQTT_BROKER_PORT parameter.

Use the ps command to check that the MQTT message broker daemon (mosquitto) is installed and running: ps -ef | grep mosquitto.

-x
Display hosts whose job exit rate is high and exceeds the threshold that is configured by the EXIT_RATE parameter in the lsb.hosts file for longer than the value specified by the JOB_EXIT_RATE_DURATION parameter that is configured in the lsb.params file. By default, these hosts are closed the next time LSF checks host exceptions and runs eadmin.

Use with the -l option to show detailed information about host exceptions.

If no hosts exceed the job exit rate, the bhosts -x command has the following output:

There is no exceptional host found
-X
Displays uncondensed output for host groups and compute units.
-R "res_req"
Displays only information about hosts that satisfy the resource requirement expression.
Note: Do not specify resource requirements by using the rusage keyword to select hosts because the criteria are ignored by LSF.

LSF supports ordering of resource requirements on all load indices, including external load indices, either static or dynamic.

-s |-sl [resource_name ...] [-loc]
Displays information about the specified resources. The bhosts -s option shows only consumable resources. This option does not display information about GPU resources (that is, this option does not display gpu_<num>n resources). Use the -gpu option to view GPU information on the host..

Returns the resource (such as fpga), the total and reserved amounts of these resources (such as 3), and the resource locations (by hostname), if you use the-s option .

As of Fix Pack 14, specifying the -sl option returns same resource information as the -s option, with the addition of the following information:
  • Specific name for each resource (for example, if there are three types of the fpga resources, you can assign three names: card1, card2, and card3). The names describes the specific resource and is assigned to the job upon dispatch.
  • Which of these names has been assigned to the resource (for example, card1).

Note that if the LOCATION parameter in the lsf.cluster.clustername file is set to all to indicate that the resource is shared by all hosts in the cluster, the LOCATION field in the bhosts -s command output also displays ALL. To display the individual names of all the hosts in the cluster in the bhosts -s command output, specify the -loc option together with the -s option.

When LSF License Scheduler is configured to work with LSF Advanced Edition submission and execution clusters, LSF Advanced Edition considers LSF License Scheduler cluster mode and project mode features to be shared features. When you run the bhosts -s command from a host in the submission cluster, it shows no TOTAL and RESERVED tokens available for the local hosts in the submission cluster, but shows the number of available tokens for TOTAL and the number of used tokens for RESERVED in the execution clusters.

host_name ... | host_group ... | compute unit ...
Displays only information about the specified hosts. Do not use quotation marks to specify multiple hosts.

For host groups and compute units, the names of the member hosts are displayed instead of the name of the host group or compute unit. Do not use quotation marks to specify multiple host groups or compute units.

cluster_name
LSF multicluster capability only. Displays information about hosts in the specified cluster.
-h
Prints command usage to stderr and exits.
-V
Prints LSF release version to stderr and exits.

Output: Host-based default

Displays the following fields:

HOST_NAME
The name of the host. If a host has running batch jobs, but the host is removed from the configuration, the host name is displayed as lost_and_found.

For condensed host groups, the HOST_NAME value is the name of host group.

STATUS
With LSF multicluster capability, not shown for fully exported hosts.
The status of the host and the sbatchd daemon. Batch jobs can be dispatched only to hosts with an ok status. Host status has the following values:
ok
The host is available to accept batch jobs.

For condensed host groups, if a single host in the host group is ok, the overall status is also shown as ok.

If any host in the host group or compute unit is not ok, bhosts displays the first host status that it encounters as the overall status for the condensed host group. Use the bhosts -X command to see the status of individual hosts in the host group or compute unit.

unavail
The host is down, or LIM and the sbatchd daemon on the host are unreachable.
unreach
LIM on the host is running but the sbatchd daemon is unreachable.
closed
The host is not allowed to accept any remote batch jobs. The host can be closed for several reasons.
closed_Cu_excl
This host is a member of a compute unit that is running an exclusive compute unit job.
JL/U
With LSF multicluster capability, not shown for fully exported hosts.

The maximum number of job slots that the host can process on a per user basis. A dash (-) indicates no limit.

For condensed host groups or compute units, the JL/U value is the total number of job slots that all hosts in the group or unit can process on a per user basis.

The host does not allocate more than JL/U job slots for one user at the same time. These job slots are used by running jobs, as well as by suspended or pending jobs with reserved slots.

For preemptive scheduling, the accounting is different. These job slots are used by running jobs and by pending jobs with reserved slots.

MAX
The maximum number of job slots available. A dash (-) indicates no limit.

For condensed host groups and compute units, the MAX value is the total maximum number of job slots available in all hosts in the host group or compute unit.

These job slots are used by running jobs, as well as by suspended or pending jobs with reserved slots.

If preemptive scheduling is used, suspended jobs are not counted.

A host does not always have to allocate this many job slots if jobs are waiting. The host must also satisfy its configured load conditions to accept more jobs.

NJOBS
The number of tasks for all jobs that are dispatched to the host. The NJOBS value includes running, suspended, and chunk jobs.

For condensed host groups and compute units, the NJOBS value is the total number of tasks that are used by jobs that are dispatched to any host in the host group or compute unit.

If the -alloc option is used, total is the sum of the RUN, SSUSP, USUSP, and RSV counters.

RUN
The number of tasks for all running jobs on the host.

For condensed host groups and compute units, the RUN value is the total number of tasks for running jobs on any host in the host group or compute unit. If the -alloc option is used, total is the allocated slots for the jobs on the host.

SSUSP
The number of tasks for all system suspended jobs on the host.

For condensed host groups and compute units, the SSUSP value is the total number of tasks for all system suspended jobs on any host in the host group or compute unit. If the -alloc option used, total is the allocated slots for the jobs on the host.

USUSP
The number of tasks for all user suspended jobs on the host. Jobs can be suspended by the user or by the LSF administrator.

For condensed host groups and compute units, the USUSP value is the total number of tasks for all user suspended jobs on any host in the host group or compute unit. If the -alloc option used, total is the allocated slots for the jobs on the host.

RSV
The number of tasks for all pending jobs with reserved slots on the host.

For condensed host groups and compute units, the RSV value is the total number of tasks for all pending jobs with reserved slots on any host in the host group or compute unit. If the -alloc option used, total is the allocated slots for the jobs on the host.

Output: Host-based -l option

In addition to the default output fields, the -l option also displays the following information:
loadSched, loadStop
The scheduling and suspending thresholds for the host. If a threshold is not defined, the threshold from the queue definition applies. If both the host and the queue define a threshold for a load index, the most restrictive threshold is used.

The migration threshold is the time that a job dispatched to this host can remain suspended by the system before LSF attempts to migrate the job to another host.

STATUS
The long format that is shown by the -l option gives the possible reasons for a host to be closed. If a power policy is enabled in the lsb.threshold file, it shows the power state:
closed_Adm
The host is closed by the LSF administrator or root with the badmin hclose command. No job can be dispatched to the host, but jobs that are running on the host are not affected.
closed_Busy
The host is overloaded. At least one load index exceeds the configured threshold. Indices that exceed their threshold are identified by an asterisk (*). No job can be dispatched to the host, but jobs that are running on the host are not affected.
closed_Cu_Excl
This host is a member of a compute unit that is running an exclusive compute unit job (submitted with the bsub -R "cu[excl]" command).
closed_EGO
For EGO-enabled SLA scheduling, host is closed because it was not allocated by EGO to run LSF jobs. Hosts that are allocated from EGO display the status ok.
closed_Excl
The host is running an exclusive job (submitted with the bsub -x command).
closed_Full
The maximum number of job slots on the host was reached. No job can be dispatched to the host, but jobs that are running on the host are not affected.
closed_LIM
LIM on the host is unreachable, but the sbatchd daemon is running.
closed_Lock
The host is locked by the EGO administrator or root by using lsadmin limlock command. Running jobs on the host are suspended by EGO (SSUSP state). Use the lsadmin limunlock command to unlock LIM on the local host.
closed_Wind
The host is closed by a dispatch window that is defined in the lsb.hosts file. No job can be dispatched to the host, but jobs that are running on the host are not affected.
closed_RC
The LSF resource connector host is closed because of a resource provider reclaim request. Hosts are also marked as closed_RC before they are returned to a resource provider (such as EGO, OpenStack, Amazon Web Services) when maximum time-to-live (the LSB_RC_EXTERNAL_HOST_MAX_TTL parameter in the lsf.conf file) or host idle time (the LSB_RC_EXTERNAL_HOST_IDLE_TIME parameter in the lsf.conffile) was reached.
on
The host power state is on.
Note: Power state on does not mean that the host state is ok, which depends on whether the lim and sbatchd daemons can be connected by the management host.
off
The host is powered off by policy or manually.
suspend
The host is suspended by policy or manually with badmin hpower.
restarting
The host is resetting when resume operation failed.
resuming
The host is being resumed from standby state, which is triggered by either policy or cluster administrator.
suspending
The host is being suspended which is triggered by either policy or cluster administrator.
closed_Power
The host is put into power saving (suspend) state by the cluster administrator.
ok
Host suspend was triggered by power policy.
CPUF
Displays the CPU normalization factor of the host (see lshosts(1)).
DISPATCH_WINDOW
Displays the dispatch windows for each host. Dispatch windows are the time windows during the week when batch jobs can be run on each host. Jobs that are already started are not affected by the dispatch windows. When the dispatch windows close, jobs are not suspended. Jobs already running continue to run, but no new jobs are started until the windows reopen. The default for the dispatch window is no restriction or always open (that is, twenty-four hours a day and seven days a week). For the dispatch window specification, see the description for the DISPATCH_WINDOWS keyword under the -l option in the bqueues command.
CURRENT LOAD
Displays the total and reserved host load.
Reserved
You specify reserved resources by using the bsub -R option. These resources are reserved by jobs that are running on the host.
Total
The total load has different meanings, depending on whether the load index is increasing or decreasing.

For increasing load indices, such as run queue lengths, CPU usage, paging activity, logins, and disk I/O, the total load is the consumed plus the reserved amount. The total load is calculated as the sum of the current load and the reserved load. The current load is the load that is shown by the lsload command.

For decreasing load indices, such as available memory, idle time, available swap space, and available space in the tmp directory, the total load is the available amount. The total load is the difference between the current load and the reserved load. This difference is the available resource as shown by the lsload command.

LOAD THRESHOLD

Displays the scheduling threshold (loadSched) and the suspending threshold (loadStop). Also displays the migration threshold if defined and the checkpoint support if the host supports checkpointing.

The format for the thresholds is the same as for batch job queues. For an explanation of the thresholds and load indices, see the description for the QUEUE SCHEDULING PARAMETERS keyword under the -l option of the bqueues command.

THRESHOLD AND LOAD USED FOR EXCEPTIONS

Displays the configured threshold of EXIT_RATE for the host and its current load value for host exceptions.

ADMIN ACTION COMMENT

If the EGO administrator specified an administrator comment with the -C option of the badmin host control commands hclose or hopen, the comment text is displayed.

PE NETWORK INFORMATION

Displays network resource information for IBM Parallel Edition (PE) jobs that are submitted with the bsub -network option, or to a queue (defined in the lsb.queuesfile) or an application profile (defined in the lsb.applications file) with the NETWORK_REQ parameter defined.

The following example shows PE NETWORK INFORMATION:
bhosts -l

...
PE NETWORK INFORMATION
NetworkID                      Status                 rsv_windows/total_windows
1111111                        ok                           4/64 
2222222                        closed_Dedicated             4/64 
...

NetworkID is the physical network ID returned by PE.

One of the following network Status values is displayed:
ok
Normal status.
closed_Full
All network windows are reserved.
closed_Dedicated
A dedicated PE job is running on the network (the usage=dedicated option is specified in the network resource requirement string).
unavail
Network information is not available.
CONFIGURED AFFINITY CPU LIST

The host is configured in the lsb.hosts file to accept jobs for CPU and memory affinity scheduling. If the AFFINITY parameter is configured as Y, the keyword all is displayed. If a CPU list is specified under the AFFINITY column, the configured CPU list for affinity scheduling is displayed.

Output: Resource-based -s option

The -s option displays the following resource information: the amounts that are used for scheduling, the amounts reserved, and the associated hosts for the resources. Only resources (shared or host-based) with numeric values are displayed.

The following fields are displayed:
RESOURCE
The name of the resource.
TOTAL
The total amount free of a resource that is used for scheduling.
RESERVED
The amount that is reserved by jobs. You specify the reserved resource by using the bsub -R option.
LOCATION
The hosts that are associated with the resource.

Output: Host-based -aff option

The -aff option displays host topology information for CPU and memory affinity scheduling. Only the topology nodes that contain CPUs in the list in the CPULIST parameter that is defined in the lsb.hosts file are displayed.

The following fields are displayed:
AFFINITY
If the host is configured in the lsb.hosts file to accept jobs for CPU and memory affinity scheduling, and the host supports affinity scheduling, AFFINITY: Enabled is displayed.

If the host is configured in the lsb.hosts file to accept jobs for CPU and memory affinity scheduling, but the host does not support affinity scheduling, AFFINITY: Disabled (not supported) is displayed. If the host is LIM is not available or sbatchd is unreachable, AFFINITY: UNKNOWN is displayed.

Host[memory] host_name
Maximum available memory on the host. If memory availability cannot be determined, a dash (-) is displayed for the host. If the -l option is specified with the -aff option, the host name is not displayed.

For hosts that do not support affinity scheduling, a dash (-) is displayed for host memory and no host topology is displayed.

NUMA[numa_node: requested_mem / max_mem]
Requested and total NUMA node memory. It is possible for requested memory for the NUMA node to be greater than the maximum available memory displayed.

A socket is a collection of cores with a direct pipe to memory. Each socket contains 1 or more cores. A socket is not necessarily a physical socket, but rather refers to the memory architecture of the machine.

A core is a single entity capable of performing computations.

A node contains sockets. A socket contains cores, and a core can contain threads if the core is enabled for multithreading.

If no NUMA nodes are present, then the NUMA layer in the output is not shown. Other relevant items such as host, socket, core, and thread are still shown.

If the host is not available, only the host name is displayed. A dash (-) is shown where available host memory would normally be displayed.

The following example shows CONFIGURED AFFINITY CPU LIST:
bhosts -l -aff hostA
HOST  hostA
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -      8      0      0      0      0      0      -

 CURRENT LOAD USED FOR SCHEDULING:
                r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots
 Total           0.0   0.0   0.0   30%   0.0   193   25     0 8605M  5.8G 13.2G      8
 Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      -


 LOAD THRESHOLD USED FOR SCHEDULING:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -


 CONFIGURED AFFINITY CPU LIST: all

 AFFINITY: Enabled
 Host[15.7G]
     NUMA[0: 100M / 15.7G]
         Socket0
             core0(0)
         Socket1
             core0(1)
         Socket2
             core0(2)
         Socket3
             core0(3)
         Socket4
             core0(4)
         Socket5
             core0(5)
         Socket6
             core0(6)
         Socket7
             core0(7)
  
When EGO detects missing elements in the topology, it attempts to correct the problem by adding the missing levels into the topology. In the following example, sockets and cores are missing on host hostB:
...
Host[1.4G] hostB
    NUMA[0: 1.4G / 1.4G] (*0 *1)
...

A job that requests two cores, or two sockets, or 2 CPUs runs. Requesting two cores from the same NUMA node runs. However, a job that requests two cores from the same socket remains pending.

Output: GPU-based -gpu option

The -gpu option displays information of the GPUs on the host.

The following fields are displayed:
HOST_NAME
The host name.
GPU_ID
The GPU IDs on the host. Each GPU is shown as a separate line.
MODEL
The full model name, which consists of the GPU brand name and the model type.
MUSED
The amount of GPU memory currently in use.
MRSV
The amount of GPU memory that is reserved by the job.
NJOBS
The total number of jobs that are using the GPUs.
RUN
The total number of running jobs that are using the GPUs.
SUSP
The total number of suspended jobs that are using the GPUs.
RSV
The total number of pending jobs that reserved the GPUs.
VENDOR
The GPU vendor type (that is, the GPU brand name).
If the -l option is specified with the -gpu option, shows more details of the GPUs with the following fields:
NGPUS
The total number of GPUs on the host.
SHARED_AVAIL
The current total number of GPUs that are available for concurrent use by multiple jobs (that is, when the job is submitted with -gpu mode=shared or -gpu j_exclusive=no options)
EXCLUSIVE_AVAIL
The current total number of GPUs that are used exclusive by the job (that is, when the job is submitted with -gpu mode=exclusive_process or -gpu j_exclusive=yes options)
STATIC ATTRIBUTES
Static GPU information. The following field is specific to this section:
NVLINK/XGMI
The connections with other GPUs on the same host.

The connection flag of each GPU is separated by a slash (/) with the next GPU, with a Y showing that there is a direct NVLink (for Nvidia) or xGMI (for AMD) connection with that GPU.

MIG
A flag to indicate whether the GPU supports Nvidia Multi-Instance GPU (MIG) functions.
DYNAMIC ATTRIBUTES
The latest GPU usage information as maintained by LSF.
GPU JOB INFORMATION
Information on jobs that are using the host's GPUs. The following fields are specific to this section:
JEXCL
Flag to indicate whether the GPU job requested that the allocated GPUs cannot by used by other jobs (that is, whether the job was submitted with -gpu j_exclusive=yes)
RUNJOBIDS
The IDs of the running GPU jobs on the GPU.
SUSPJOBIDS
The IDs of the suspended GPU jobs on the GPU.
RSVJOBIDS
The IDs of the pending GPU jobs that reserved the GPU.

Resource connector -rconly option

The -rconly option displays information that is specific to the LSF resource connector.

The following fields are displayed:
PUB_DNS_NAME and PUB_IP_ADDRESS
Public DNS name and IP address of the host.
PRIV_DNS_NAME and PRIV_IP_ADDRESS
Private DNS name and IP address of the host.
RC_STATUS
LSF resource connector status.
PROV_STATUS
Resource provider status.
TAG
The RC_ACCOUNT value that is defined in the lsb.queues or lsb.applications files.
UPDATED_AT
Time stamp of the latest status change.
INSTANCE_ID
ID of the created machine instance. This ID uniquely identifies the host in LSF.
For example,
bhosts -rconly 
PROVIDER : aws 
  TEMPLATE : aws-vm-1 
    PUB_DNS_NAME       PUB_IP_ADDRESS  PRIV_DNS_NAME      PRIV_IP_ADDRESS RC_STATUS             PROV_STATUS           TAG            UPDATED_AT              INSTANCE_ID 
    ec2-52-43-171-109. 52.43.171.109   ip-192-168-0-85.us 192.168.0.85    Done                  terminated            default        2017-05-31T14:30:47CDT  - 
    ec2-35-160-157-112 35.160.157.112  ip-192-168-0-69.us 192.168.0.69    Allocated             running               default        2017-05-31T14:32:00CDT  - 

Output: Attribute -attr option

The -attr option displays information on attributes that are attached to the host.

The following fields are displayed:
HOSTS
The name of the hosts to which this attribute is attached.
ATTRIBUTE
The name of the attribute.
TTL
The current time-to-live (TTL) value of the attribute.
CREATOR
The name of the user that created the attribute.
DESCRIPTION
User-specified information about the attribute.

Files

Reads the lsb.hosts file.

See also

lsb.hosts, bqueues, lshosts, badmin, lsadmin