lsf.cluster

The cluster configuration file. There is one file for each cluster, called lsf.cluster.cluster_name. The cluster_name suffix is the name of the cluster defined in the Cluster section of the lsf.shared file. All LSF hosts are listed in this file, along with the list of LSF administrators and the installed LSF features.

Changing lsf.cluster configuration

After changing the lsf.cluster.cluster_name file, run the following commands:
  • lsadmin reconfig to reconfigure LIM
  • badmin mbdrestart to restart mbatchd
  • lsadmin limrestart to restart LIM (on all changed non-management hosts)

Location

This file is typically installed in the directory defined by LSF_ENVDIR.

Structure

The lsf.cluster.cluster_name file contains the following configuration sections:
  • Parameters section
  • ClusterAdmins section
  • Host section
  • ResourceMap section
  • RemoteClusters section

Parameters section

About lsf.cluster

The lsf.cluster.cluster_name file contains two types of configuration information:
Cluster definition information
affects all LSF applications. Defines cluster administrators, hosts that make up the cluster, attributes of each individual host such as host type or host model, and resources using the names defined in lsf.shared.
LIM policy information
affects applications that rely on LIM job placement policy. Defines load sharing and job placement policies provided by LIM.

Parameters

  • ADJUST_DURATION
  • ELIM_ABORT_VALUE
  • ELIM_POLL_INTERVAL
  • ELIMARGS
  • EXINTERVAL
  • FLOAT_CLIENTS
  • FLOAT_CLIENTS_ADDR_RANGE
  • HOST_INACTIVITY_LIMIT
  • LSF_ELIM_BLOCKTIME
  • LSF_ELIM_DEBUG
  • LSF_ELIM_RESTARTS
  • LSF_HOST_ADDR_RANGE
  • MASTER_INACTIVITY_LIMIT
  • PROBE_TIMEOUT
  • RETRY_LIMIT

ADJUST_DURATION

Syntax

ADJUST_DURATION=integer

Description

Integer reflecting a multiple of EXINTERVAL that controls the time period during which load adjustment is in effect.

The lsplace and lsloadadj commands artificially raise the load on a selected host. This increase in load decays linearly to 0 over time.

Default

3

ELIM_ABORT_VALUE

Syntax

ELIM_ABORT_VALUE=integer

Description

Integer that triggers an abort for an ELIM.

Default

97 (triggers abort)

ELIM_POLL_INTERVAL

Syntax

ELIM_POLL_INTERVAL=seconds

Description

Time interval, in seconds, that the LIM samples external load index information. If your elim executable is programmed to report values more frequently than every 5 seconds, set the ELIM_POLL_INTERVAL so that it samples information at a corresponding rate.

Valid values

0.001 to 5

Default

5 seconds

ELIMARGS

Syntax

ELIMARGS=cmd_line_args

Description

Specifies command-line arguments required by an elim executable on startup. Used only when the external load indices feature is enabled.

Default

Undefined

EXINTERVAL

Syntax

EXINTERVAL=time_in_seconds

Description

Time interval, in seconds, at which the LIM daemons exchange load information

On extremely busy hosts or networks, or in clusters with a large number of hosts, load may interfere with the periodic communication between LIM daemons. Setting EXINTERVAL to a longer interval can reduce network load and slightly improve reliability, at the cost of slower reaction to dynamic load changes.

Note that if you define the time interval as less than 5 seconds, LSF automatically resets it to 5 seconds.

Default

15 seconds

FLOAT_CLIENTS

Syntax

FLOAT_CLIENTS=number_of_floating_clients

Description

Sets the maximum allowable size for floating clients in a cluster. If FLOAT_CLIENTS is not specified in the lsf.cluster.cluster_name file, the floating LSF client feature is disabled.

CAUTION:

When the LSF floating client feature is enabled, any host can submit jobs to the cluster. You can limit which hosts can be LSF floating clients with the parameter FLOAT_CLIENTS_ADDR_RANGE in the lsf.cluster.cluster_name file.

Default

Undefined

FLOAT_CLIENTS_ADDR_RANGE

Syntax

FLOAT_CLIENTS_ADDR_RANGE=IP_address ...

Description

Optional. IP address or range of addresses of domains from which floating client hosts can submit requests. Multiple ranges can be defined, separated by spaces. The IP address can have either a dotted quad notation (IPv4) or IP Next Generation (IPv6) format. LSF supports both formats; you do not have to map IPv4 addresses to an IPv6 format.
Note:

To use IPv6 addresses, you must define the parameter LSF_ENABLE_SUPPORT_IPV6 in lsf.conf.

If the value of FLOAT_CLIENT_ADDR_RANGE is undefined, there is no security and any hosts can be LSF floating clients.

If a value is defined, security is enabled. If there is an error in the configuration of this variable, by default, no hosts will be allowed to be LSF floating clients.

When this parameter is defined, client hosts that do not belong to the domain will be denied access.

If a requesting host belongs to an IP address that falls in the specified range, the host will be accepted to become a floating client.

IP addresses are separated by spaces, which denote OR operations.

If you define FLOAT_CLIENT_ADDR_RANGE with:
  • No range specified, all IPv4 and IPv6 clients can submit requests.
  • Only an IPv4 range specified, only IPv4 clients within the range can submit requests.
  • Only an IPv6 range specified, only IPv6 clients within the range can submit requests.
  • Both an IPv6 and IPv4 range specified, IPv6 and IPv4 clients within the ranges can submit requests.

The asterisk (*) character indicates any value is allowed.

The dash (-) character indicates an explicit range of values. For example 1-4 indicates 1,2,3,4 are allowed.

Open ranges such as *-30, or 10-*, are allowed.

If a range is specified with fewer fields than an IP address such as 10.161, it is considered as 10.161.*.*.

Address ranges are validated at configuration time so they must conform to the required format. If any address range is not in the correct format, no hosts will be accepted as LSF floating clients, and an error message will be logged in the LIM log.

This parameter is limited to 2048 characters.

For IPv6 addresses, the double colon symbol (::) indicates multiple groups of 16-bits of zeros. You can also use (::) to compress leading and trailing zeros in an address filter, as shown in the following example:

FLOAT_CLIENTS_ADDR_RANGE=1080::8:800:20fc:*

This definition allows hosts with addresses 1080:0:0:0:8:800:20fc:* (three leading zeros).

You cannot use the double colon (::) more than once within an IP address. You cannot use a zero before or after (::). For example, 1080:0::8:800:20fc:* is not a valid address.

Notes

After you configure FLOAT_CLIENTS_ADDR_RANGE, check the lim.log.host_name file to make sure this parameter is correctly set. If this parameter is not set or is wrong, this will be indicated in the log file.

Examples

FLOAT_CLIENTS_ADDR_RANGE=100

All IPv4 and IPv6 hosts with a domain address starting with 100 will be allowed access.
  • To specify only IPv4 hosts, set the value to 100.*
  • To specify only IPv6 hosts, set the value to 100:*

FLOAT_CLIENTS_ADDR_RANGE=100-110.34.1-10.4-56

All client hosts belonging to a domain with an address having the first number between 100 and 110, then 34, then a number between 1 and 10, then, a number between 4 and 56 will be allowed access. Example: 100.34.9.45, 100.34.1.4, 102.34.3.20, etc. No IPv6 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE=100.172.1.13 100.*.30-54 124.24-*.1.*-34

All client hosts belonging to a domain with the address 100.172.1.13 will be allowed access. All client hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 will be allowed access. All client hosts belonging to domains starting with 124, then from 24 onward, then 1, then from 0 to 34 will be allowed access. No IPv6 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE=12.23.45.*

All client hosts belonging to domains starting with 12.23.45 are allowed. No IPv6 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE=100.*43

The * character can only be used to indicate any value. In this example, an error will be inserted in the LIM log and no hosts will be accepted to become LSF floating clients. No IPv6 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE=100.*43 100.172.1.13

Although one correct address range is specified, because *43 is not correct format, the entire line is considered not valid. An error will be inserted in the LIM log and no hosts will be accepted to become LSF floating clients. No IPv6 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE = 3ffe

All client IPv6 hosts with a domain address starting with 3ffe will be allowed access. No IPv4 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE = 3ffe:fffe::88bb:*

Expands to 3ffe:fffe:0:0:0:0:88bb:*. All IPv6 client hosts belonging to domains starting with 3ffe:fffe::88bb:* are allowed. No IPv4 hosts are allowed.

FLOAT_CLIENTS_ADDR_RANGE = 3ffe-4fff:fffe::88bb:aa-ff 12.23.45.*

All IPv6 client hosts belonging to domains starting with 3ffe up to 4fff, then fffe::88bb, and ending with aa up to ff are allowed. All IPv4 client hosts belonging to domains starting with 12.23.45 are allowed.

FLOAT_CLIENTS_ADDR_RANGE = 3ffe-*:fffe::88bb:*-ff

All IPv6 client hosts belonging to domains starting with 3ffe up to ffff and ending with 0 up to ff are allowed. No IPv4 hosts are allowed.

Default

Undefined. No security is enabled. Any host in any domain is allowed access to LSF floating clients.

See also

LSF_ENABLE_SUPPORT_IPV6

HOST_INACTIVITY_LIMIT

Syntax

HOST_INACTIVITY_LIMIT=integer

Description

Integer that is multiplied by EXINTERVAL, the time period you set for the communication between the parent and server host LIMs to ensure all parties are functioning.

A server host LIM can send its load information any time from EXINTERVAL to (HOST_INACTIVITY_LIMIT-1)*EXINTERVAL seconds. A management host LIM sends an announcement to each host at least every EXINTERVAL*(HOST_INACTIVITY_LIMIT-1) seconds.

The HOST_INACTIVITY_LIMIT must be greater than or equal to 2.

Increase or decrease the host inactivity limit to adjust for your tolerance for communication between parent and children. For example, if you have hosts that frequently become inactive, decrease the host inactivity limit. Note that to get the right interval, you may also have to adjust your EXINTERVAL.

Default

5

LSF_ELIM_BLOCKTIME

Syntax

LSF_ELIM_BLOCKTIME=seconds

Description

UNIX only; used when the external load indices feature is enabled.

Maximum amount of time the parent external load information manager (MELIM) waits for a complete load update string from an elim executable. After the time period specified by LSF_ELIM_BLOCKTIME, the MELIM writes the last string sent by an elim in the LIM log file (lim.log.host_name) and restarts the elim.

Defining LSF_ELIM_BLOCKTIME also triggers the MELIM to restart elim executables if the elim does not write a complete load update string within the time specified for LSF_ELIM_BLOCKTIME.

Valid values

Non-negative integers. For example, if your elim writes name-value pairs with 1 second intervals between them, and your elim reports 12 load indices, allow at least 12 seconds for the elim to finish writing the entire load update string. In this case, define LSF_ELIM_BLOCKTIME as 15 seconds or more.

A value of 0 indicates that the MELIM expects to receive the entire load string all at once.

If you comment out or delete LSF_ELIM_BLOCKTIME, the MELIM waits 2 seconds for a complete load update string.

Default

4 seconds

See also

LSF_ELIM_RESTARTS to limit how many times the ELIM can be restarted.

LSF_ELIM_DEBUG

Syntax

LSF_ELIM_DEBUG=y

Description

UNIX only; used when the external load indices feature is enabled.

When this parameter is set to y, all external load information received by the load information manager (LIM) from the parent external load information manager (MELIM) is logged in the LIM log file (lim.log.host_name).

Defining LSF_ELIM_DEBUG also triggers the MELIM to restart elim executables if the elim does not write a complete load update string within the time specified for LSF_ELIM_BLOCKTIME.

Default

Undefined; external load information sent by an to the MELIM is not logged.

See also

LSF_ELIM_BLOCKTIME to configure how long LIM waits before restarting the ELIM.

LSF_ELIM_RESTARTS to limit how many times the ELIM can be restarted.

LSF_ELIM_RESTARTS

Syntax

LSF_ELIM_RESTARTS=integer

Description

UNIX only; used when the external load indices feature is enabled.

Maximum number of times the parent external load information manager (MELIM) can restart elim executables on a host. Defining this parameter prevents an ongoing restart loop in the case of a faulty elim. The MELIM waits the LSF_ELIM_BLOCKTIME to receive a complete load update string before restarting the elim. The MELIM does not restart any elim executables that exit with ELIM_ABORT_VALUE.

Important:

Either LSF_ELIM_BLOCKTIME or LSF_ELIM_DEBUG must also be defined; defining these parameters triggers the MELIM to restart elim executables.

Valid values

Non-negative integers.

Default

Undefined; the number of elim restarts is unlimited.

See also

LSF_ELIM_BLOCKTIME, LSF_ELIM_DEBUG

LSF_HOST_ADDR_RANGE

Syntax

LSF_HOST_ADDR_RANGE=IP_address ...

Description

Identifies the range of IP addresses that are allowed to be LSF hosts that can be dynamically added to or removed from the cluster.
CAUTION:

To enable dynamically added hosts after installation, you must define LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name, and LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf. If you enable dynamic hosts during installation, you must define an IP address range after installation to enable security.

If a value is defined, security for dynamically adding and removing hosts is enabled, and only hosts with IP addresses within the specified range can be added to or removed from a cluster dynamically.

Specify an IP address or range of addresses, using either a dotted quad notation (IPv4) or IP Next Generation (IPv6) format. LSF supports both formats; you do not have to map IPv4 addresses to an IPv6 format. Multiple ranges can be defined, separated by spaces.
Note:

To use IPv6 addresses, you must define the parameter LSF_ENABLE_SUPPORT_IPV6 in lsf.conf.

If there is an error in the configuration of LSF_HOST_ADDR_RANGE (for example, an address range is not in the correct format), no host will be allowed to join the cluster dynamically and an error message will be logged in the LIM log. Address ranges are validated at startup, reconfiguration, or restart, so they must conform to the required format.

If a requesting host belongs to an IP address that falls in the specified range, the host will be accepted to become a dynamic LSF host.

IP addresses are separated by spaces, and considered "OR" alternatives.

If you define the parameter LSF_HOST_ADDR_RANGE with:
  • No range specified, all IPv4 and IPv6 clients are allowed.
  • Only an IPv4 range specified, only IPv4 clients within the range are allowed.
  • Only an IPv6 range specified, only IPv6 clients within the range are allowed.
  • Both an IPv6 and IPv4 range specified, IPv6 and IPv4 clients within the ranges are allowed.

The asterisk (*) character indicates any value is allowed.

The dash (-) character indicates an explicit range of values. For example 1-4 indicates 1,2,3,4 are allowed.

Open ranges such as *-30, or 10-*, are allowed.

For IPv6 addresses, the double colon symbol (::) indicates multiple groups of 16-bits of zeros. You can also use (::) to compress leading and trailing zeros in an address filter, as shown in the following example:

LSF_HOST_ADDR_RANGE=1080::8:800:20fc:*

This definition allows hosts with addresses 1080:0:0:0:8:800:20fc:* (three leading zeros).

You cannot use the double colon (::) more than once within an IP address. You cannot use a zero before or after (::). For example, 1080:0::8:800:20fc:* is not a valid address.

If a range is specified with fewer fields than an IP address such as 10.161, it is considered as 10.161.*.*.

This parameter is limited to 2048 characters.

Notes

After you configure LSF_HOST_ADDR_RANGE, check the lim.log.host_name file to make sure this parameter is correctly set. If this parameter is not set or is wrong, this will be indicated in the log file.

Examples

LSF_HOST_ADDR_RANGE=100

All IPv4 and IPv6 hosts with a domain address starting with 100 will be allowed access.
  • To specify only IPv4 hosts, set the value to 100.*
  • To specify only IPv6 hosts, set the value to 100:*

LSF_HOST_ADDR_RANGE=100-110.34.1-10.4-56

All hosts belonging to a domain with an address having the first number between 100 and 110, then 34, then a number between 1 and 10, then, a number between 4 and 56 will be allowed access. No IPv6 hosts are allowed. Example: 100.34.9.45, 100.34.1.4, 102.34.3.20, etc.

LSF_HOST_ADDR_RANGE=100.172.1.13 100.*.30-54 124.24-*.1.*-34

The host with the address 100.172.1.13 will be allowed access. All hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 will be allowed access. All hosts belonging to domains starting with 124, then from 24 onward, then 1, then from 0 to 34 will be allowed access. No IPv6 hosts are allowed.

LSF_HOST_ADDR_RANGE=12.23.45.*

All hosts belonging to domains starting with 12.23.45 are allowed. No IPv6 hosts are allowed.

LSF_HOST_ADDR_RANGE=100.*43

The * character can only be used to indicate any value. The format of this example is not correct, and an error will be inserted in the LIM log and no hosts will be able to join the cluster dynamically. No IPv6 hosts are allowed.

LSF_HOST_ADDR_RANGE=100.*43 100.172.1.13

Although one correct address range is specified, because *43 is not correct format, the entire line is considered not valid. An error will be inserted in the LIM log and no hosts will be able to join the cluster dynamically. No IPv6 hosts are allowed.

LSF_HOST_ADDR_RANGE = 3ffe

All client IPv6 hosts with a domain address starting with 3ffe will be allowed access. No IPv4 hosts are allowed.

LSF_HOST_ADDR_RANGE = 3ffe:fffe::88bb:*

Expands to 3ffe:fffe:0:0:0:0:88bb:*.All IPv6 client hosts belonging to domains starting with 3ffe:fffe::88bb:* are allowed. No IPv4 hosts are allowed.

LSF_HOST_ADDR_RANGE = 3ffe-4fff:fffe::88bb:aa-ff 12.23.45.*

All IPv6 client hosts belonging to domains starting with 3ffe up to 4fff, then fffe::88bb, and ending with aa up to ff are allowed. IPv4 client hosts belonging to domains starting with 12.23.45 are allowed.

LSF_HOST_ADDR_RANGE = 3ffe-*:fffe::88bb:*-ff

All IPv6 client hosts belonging to domains starting with 3ffe up to ffff and ending with 0 up to ff are allowed. No IPv4 hosts are allowed.

Default

Undefined (dynamic host feature disabled). If you enable dynamic hosts during installation, no security is enabled and all hosts can join the cluster.

See also

LSF_ENABLE_SUPPORT_IPV6

MASTER_INACTIVITY_LIMIT

Syntax

MASTER_INACTIVITY_LIMIT=integer

Description

An integer reflecting a multiple of EXINTERVAL. A server host will attempt to become the management host if it does not hear from the previous management host after (HOST_INACTIVITY_LIMIT +host_number*MASTER_INACTIVITY_LIMIT)*EXINTERVAL seconds, where host_number is the position of the host in lsf.cluster.cluster_name.

The management host is host_number 0.

Default

2

PROBE_TIMEOUT

Syntax

PROBE_TIMEOUT=time_in_seconds

Description

Specifies the timeout in seconds to be used for the connect(2) system call

Before taking over as the management host, a server host LIM will try to connect to the last known management host via TCP.

Default

2 seconds

RETRY_LIMIT

Syntax

RETRY_LIMIT=integer

Description

Integer reflecting a multiple of EXINTERVAL that controls the number of retries a parent or child LIM makes before assuming that the server or management host is unavailable.

If the management host does not hear from a server host for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the server host for RETRY_LIMIT exchange intervals before it will declare the server host as unavailable. If a server does not hear from the management host for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the management host for RETRY_LIMIT intervals before assuming that the management host is down.

Default

2

ClusterAdmins section

(Optional) The ClusterAdmins section defines the LSF administrators for the cluster. The only keyword is ADMINISTRATORS.

If the ClusterAdmins section is not present, the default LSF administrator is root. Using root as the primary LSF administrator is not recommended.

ADMINISTRATORS

Syntax

ADMINISTRATORS=administrator_name ...

Description

Specify UNIX user names.

You can also specify UNIX user group names, Windows user names, and Windows user group names. To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME\user_group).

The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/cluster_name. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/cluster_name are changed as well.

Administrators other than the primary LSF administrator have the same privileges as the primary LSF administrator except that they do not have permission to change LSF configuration files. They can perform cluster-wide operations on jobs, queues, or hosts in the system.

For flexibility, each cluster may have its own LSF administrators, identified by a user name, although the same administrators can be responsible for several clusters.

Use the -l option of the lsclusters command to display all of the administrators within a cluster.

Windows domain:
  • If the specified user or user group is a domain administrator, member of the Power Users group or a group with domain administrative privileges, the specified user or user group must belong to the LSF user domain.
  • If the specified user or user group is a user or user group with a lower degree of privileges than outlined in the previous point, the user or user group must belong to the LSF user domain and be part of the Global Admins group.

    Windows workgroup

  • If the specified user or user group is not a workgroup administrator, member of the Power Users group, or a group with administrative privileges on each host, the specified user or user group must belong to the Local Admins group on each host.

Compatibility

For backwards compatibility, ClusterManager and Manager are synonyms for ClusterAdmins and ADMINISTRATORS respectively. It is possible to have both sections present in the same lsf.cluster.cluster_name file to allow daemons from different LSF versions to share the same file.

Example

The following gives an example of a cluster with two LSF administrators. The user listed first, user2, is the primary administrator.
Begin ClusterAdmins 
ADMINISTRATORS = user2 user7 
End ClusterAdmins

Default

lsfadmin

Host section

The Host section is the last section in lsf.cluster.cluster_name and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host.

The order in which the hosts are listed in this section is important, because the first host listed becomes the LSF management host. Since the parent LIM makes all placement decisions for the cluster, set a fast machine as the management host.

The LIM on the first host listed becomes the management host LIM if this host is up; otherwise, the LIM on the second becomes the management host LIM if its host is up, and so on. Also, to avoid the delays involved in switching management host LIMs if the first machine goes down, make sure that the management host is a reliable machine. Arrange the list so that the first few hosts in the list are always in the same subnet. This avoids a situation where the second host takes over as the management host when there are communication problems between subnets.

Example Host section

This example Host section contains descriptive information for three hosts:
Begin Host 
HOSTNAME   model    type   server RESOURCES        RUNWINDOW 
hostA      SparcIPC Sparc  1      (sunos frame)    () 
hostD      Sparc10  Sparc  1      (sunos)          (5:18:30-1:8:30) 
hostD      !        !      1      ()               () 
hostE      !        !      1      (linux !bigmem)  () 
End Host

Descriptive fields

The following fields are required in the Host section:
  • HOSTNAME
  • RESOURCES
  • type
  • model
The following fields are optional:
  • server
  • nd
  • RUNWINDOW
  • REXPRI

HOSTNAME

Description

Official name of the host as returned by hostname(1)

The name must be listed in lsf.shared as belonging to this cluster.

Pattern definition

You can use string literals and special characters when defining host names. Each entry cannot contain any spaces, as the list itself is space delimited.

You can use the following special characters to specify hosts:
  • Use square brackets with a hyphen ([integer1-integer2]) or a comma ([integer1:integer2]) to define a range of non-negative integers anywhere in the host name. The first integer must be less than the second integer.
  • Use square brackets with commas ([integer1, integer2 ...]) to define individual non-negative integers anywhere in the host name.
  • Use square brackets with commas and hyphens or colons (for example, [integer1-integer2, integer3, integer4:integer5, integer6:integer7]) to define different ranges of non-negative integers anywhere in the host name.
  • Use multiple sets of square brackets (with the supported special characters) to define multiple sets of non-negative integers anywhere in the host name. For example, hostA[1,3]B[1-3] includes hostA1B1, hostA1B2, hostA1B3, hostA3B1, hostA3B2, and hostA3B3.

model

Description

Host model

The name must be defined in the HostModel section of lsf.shared. This determines the CPU speed scaling factor applied in load and placement calculations.

Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.

nd

Description

Number of local disks

This corresponds to the ndisks static resource. On most host types, LSF automatically determines the number of disks, and the nd parameter is ignored.

nd should only count local disks with file systems on them. Do not count either disks used only for swapping or disks mounted with NFS.

Default

The number of disks determined by the LIM, or 1 if the LIM cannot determine this

RESOURCES

Description

The static Boolean resources and static or dynamic numeric and string resources available on this host.

The resource names are strings defined in the Resource section of lsf.shared. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs. For example:
(fs frame hpux)
Optionally, you can specify an exclusive resource by prefixing the resource with an exclamation mark (!). For example, resource bigmem is defined in lsf.shared, and is defined as an exclusive resource for hostE:
Begin Host
HOSTNAME   model    type   server RESOURCES        RUNWINDOW
...
hostE      !        !      1      (linux !bigmem)  ()
...
End Host

Square brackets are not valid and the resource name must be alphanumeric.

You must explicitly specify the exclusive resources in the resource requirements for the job to select a host with an exclusive resource for a job. For example:
bsub -R "bigmem" myjob
or
bsub -R "defined(bigmem)" myjob
You can specify static and dynamic numeric and string resources in the resource column of the Host clause. For example:
Begin Host
HOSTNAME  model type server RESOURCES  #Keywords
hostA     !     !    1      (mg elimres patchrev=3 owner=user1)
hostB     !     !    1      (specman=5 switch=1 owner=test)
hostC     !     !    1      (switch=2 rack=rack2_2_3 owner=test)
hostD     !     !    1      (switch=1 rack=rack2_2_3 owner=test)
End Host

Static resource information is displayed by lshosts, with exclusive resources prefixed by and exclamation mark (!).

REXPRI

Description

UNIX only

Default execution priority for interactive remote jobs run under the RES

The range is from -20 to 20. REXPRI corresponds to the BSD-style nice value used for remote jobs. For hosts with System V-style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0, and +20 corresponds to 39. Higher values of REXPRI correspond to lower execution priority; -20 gives the highest priority, 0 is the default priority for login sessions, and +20 is the lowest priority.

Default

0

RUNWINDOW

Description

Dispatch window for interactive tasks.

When the host is not available for remote execution, the host status is lockW (locked by run window). LIM does not schedule interactive tasks on hosts locked by dispatch windows. Run windows only apply to interactive tasks placed by LIM. The LSF batch system uses its own (optional) host dispatch windows to control batch job processing on batch server hosts.

Format

A dispatch window consists of one or more time windows in the format begin_time-end_time. No blanks can separate begin_time and end_time. Time is specified in the form [day:]hour[:minute]. If only one field is specified, LSF assumes it is an hour. Two fields are assumed to be hour:minute. Use blanks to separate time windows.

Default

Always accept remote jobs

server

Description

Indicates whether the host can receive jobs from other hosts

Specify 1 if the host can receive jobs from other hosts; specify 0 otherwise. Servers that are set to 0 are LSF clients. Client hosts do not run the LSF daemons. Client hosts can submit interactive and batch jobs to the cluster, but they cannot execute jobs sent from other hosts.

Default

1

type

Description

Host type as defined in the HostType section of lsf.shared

The strings used for host types are determined by the system administrator; for example, SUNSOL, DEC, or HPPA. The host type is used to identify binary-compatible hosts.

The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request, the task is run on a host of the same type as the sending host.

Often one host type can be used for many machine models. For example, the host type name SUNSOL6 might be used for any computer with a SPARC processor running SunOS 6. This would include many Sun models and quite a few from other vendors as well.

Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host.

ResourceMap section

The ResourceMap section defines shared resources in your cluster. This section specifies the mapping between shared resources and their sharing hosts. When you define resources in the Resources section of lsf.shared, there is no distinction between a shared and non-shared resource. By default, all resources are not shared and are local to each host. By defining the ResourceMap section in the lsf.cluster.cluster_name file, you can define resources that are shared by all hosts in the cluster or define resources that are shared by only some of the hosts in the cluster.

This section must appear after the Host section of lsf.cluster.cluster_name, because it has a dependency on host names defined in the Host section.

ResourceMap section structure

The first line consists of the keywords RESOURCENAME and LOCATION. Subsequent lines describe the hosts that are associated with each configured resource.

Resources defined in the ResourceMap section can be viewed by using the -s option (and starting in Fix Pack 14, also the -sl option) of the lshosts (for static resource) and lsload (for dynamic resource) commands.

Example ResourceMap section using the number of resources

Begin ResourceMap 
RESOURCENAME   LOCATION 
verilog        (5@[all]) 
local          ([host1 host2] [others]) 
End ResourceMap

This example shows the verilog and local resources. Note that the verilog resource must already be defined in the RESOURCE section of the lsf.shared file. The verilog resource uses a static numeric resource shared by all hosts; that is, the value (5) indicates that there are five verilog resources. Likewise, the local resource is a numeric shared resource that contains two instances in the cluster. The first instance is shared by two machines: host1 and host2. The second instance is shared by all other hosts.

Example ResourceMap section using a specific name for the resource

Begin ResourceMap
RESOURCENAME   LOCATION
fpga           ([card1 card2 card3]@[all])
switch         ([switch1 switch2]@[host1] 3@[others])
End ResourceMap

As of Fix Pack 14, you can also define the names of numeric resources. Note that only static decreasing numeric resources, that are non-releasable, can have resource names. This example shows the fpga and switch resources. Both resource must already be defined in the RESOURCE section of the lsf.shared file. The fpga resource has three specific named resources: card1, card2, and card3, which indicates that there are three types of fpga resources.

When a job is dispatched with the assigned resource, sbatchd sets an environment variable in the format LSF_RESOURCE_resourcename___hostname with a value (for example, LSF_RESOURCE_fpga___host1=card1). The job can then check its environment variable to know which name and value it has been assigned.

LOCATION

Description

Defines the hosts that share the resource

For a static resource, you must define an initial value here as well. Do not define a value for a dynamic resource.

instance is a list of host names that share an instance of the resource. The reserved words all, others, and default can be specified for the instance:

all - Indicates that there is only one instance of the resource in the whole cluster and that this resource is shared by all of the hosts

Use the not operator (~) to exclude hosts from the all specification. For example:
(2@[all ~host3 ~host4])

This means that 2 units of the resource are shared by all server hosts in the cluster made up of host1 host2 ... hostn, except for host3 and host4. This is useful if you have a large cluster but only want to exclude a few hosts.

The parentheses are required in the specification. The not operator can only be used with the all keyword. It is not valid with the keywords others and default.

others - Indicates that the rest of the server hosts not explicitly listed in the LOCATION field comprise one instance of the resource

For example:
2@[host1] 4@[others] 

This indicates that there are 2 units of the resource on host1 and 4 units of the resource shared by all other hosts.

The default keyword indicates an instance of a resource on each host in the cluster

This specifies a special case where the resource is in effect not shared and is local to every host. default means at each host. Normally, you should not need to use default, because by default, all built-in resources are local to each host. However, resources that you defined must always be mapped. You might want to use ResourceMap for a non-shared static resource if you need to specify different values for the resource on different hosts.

RESOURCENAME

Description

Name of the resource

This resource name must be defined in the Resource section of the lsf.shared file. You must specify at least a name and description for the resource, using the keywords RESOURCENAME and DESCRIPTION.
  • A resource name cannot begin with a number
  • A resource name cannot contain any of the following characters:
    :  .  (  )  [  +  - *  /  !  &  | <  >  @  =
  • A resource name cannot be any of the following reserved names:
    cpu cpuf io logins ls idle maxmem maxswp maxtmp type model status it 
    mem ncpus define_ncpus_cores define_ncpus_procs 
    define_ncpus_threads ndisks pg r15m r15s r1m swap swp tmp ut
  • To avoid conflict with inf and nan keywords in 3rd-party libraries, resource names should not begin with inf or nan (upper case or lower case). Resource requirment strings, such as -R "infra" or -R "nano" will cause an error. Use -R "defined(infxx)" or -R "defined(nanxx)", to specify these resource names.
  • Resource names are case sensitive
  • Resource names can be up to 39 characters in length

RemoteClusters section

Optional. This section is used only in a MultiCluster environment. By default, the local cluster can obtain information about all other clusters specified in lsf.shared. The RemoteClusters section limits the clusters that the local cluster can obtain information about.

The RemoteClusters section is required if you want to configure cluster equivalency, cache interval, daemon authentication across clusters, or if you want to run parallel jobs across clusters. To maintain compatibility in this case, make sure the list includes all clusters specified in lsf.shared, even if you only configure the default behavior for some of the clusters.

The first line consists of keywords. CLUSTERNAME is mandatory and the other parameters are optional.

Subsequent lines configure the remote cluster.

Example RemoteClusters section

Begin RemoteClusters
CLUSTERNAME  EQUIV   CACHE_INTERVAL  RECV_FROM  AUTH
cluster1       Y           60            Y      KRB
cluster2       N           60            Y      -
cluster4       N           60            N      PKI
End RemoteClusters

CLUSTERNAME

Description

Remote cluster name

Defines the Remote Cluster list. Specify the clusters you want the local cluster to recognize. Recognized clusters must also be defined in lsf.shared. Additional clusters listed in lsf.shared but not listed here will be ignored by this cluster.

EQUIV

Description

Specify ‘Y’ to make the remote cluster equivalent to the local cluster. Otherwise, specify N. The management host LIM considers all equivalent clusters when servicing requests from clients for load, host, or placement information.

EQUIV changes the default behavior of LSF commands and utilities and causes them to automatically return load (lsload), host (lshosts), or placement (lsplace) information about the remote cluster as well as the local cluster, even when you don’t specify a cluster name.

CACHE_INTERVAL

Description

Specify the load information cache threshold, in seconds. The host information threshold is twice the value of the load information threshold.

To reduce overhead and avoid updating information from remote clusters unnecessarily, LSF displays information in the cache, unless the information in the cache is older than the threshold value.

Default

60 seconds

RECV_FROM

Description

Specifies whether the local cluster accepts parallel tasks that originate in a remote cluster

RECV_FROM does not affect regular or interactive batch jobs.

Specify Y if you want to run parallel jobs across clusters. Otherwise, specify N.

Default

Y

AUTH

Description

Defines the preferred authentication method for LSF daemons communicating across clusters. Specify the same method name that is used to identify the corresponding eauth program (eauth.method_name). If the remote cluster does not prefer the same method, LSF uses default security between the two clusters.

Default

- (only privileged port (setuid) authentication is used between clusters)