Host-related features
The following new features are related to host management and display.
Condensed host format
When you specify host names or host groups with condensed notation, you can now use colons (:) to specify a range of numbers. Colons are used the same as hyphens (-) are currently used to specify ranges and can be used interchangeably in condensed notation. You can also use leading zeros to specify host names.
You can now use multiple square brackets (with the supported special characters) to define multiple sets of non-negative integers anywhere in the host name. For example, hostA[1,3]B[1-3] includes hostA1B1, hostA1B2, hostA1B3, hostA3B1, hostA3B2, and hostA3B3.
The additions to the condensed notation apply to all cases where you can specify condensed notation, including commands that use the -m option or a host list to specify multiple host names, the lsf.cluster.clustername file (in HOSTNAME column of the Hosts section), and the lsb.hosts file (in the HOST_NAME column of the Host section, the GROUP_MEMBER column of the HostGroup section, and the MEMBER column of the ComputeUnit section).
- bsub -m "host[1-100].example.com"
The job is submitted to host1.example.com, host2.example.com, host3.example.com, all the way to host100.example.com.
- bsub -m
"host[01-03].example.com"
The job is submitted to host01.example.com, host02.example.com, and host03.example.com.
- bsub -m
"host[5:200].example.com"
The job is submitted to host5.example.com, host6.example.com, host7.example.com, all the way to host200.example.com.
- bsub -m
"host[05:09].example.com"
The job is submitted to host05.example.com, host06.example.com, all the way to host09.example.com.
- bsub -m "host[1-10,12,20-25].example.com"
The job is submitted to host1.example.com, host2.example.com, host3.example.com, up to and including host10.example.com. It is also submitted to host12.example.com and the hosts between and including host20.example.com and host25.example.com.
- bsub -m
"host[1:10,20,30:39].example.com"
The job is submitted to host1.example.com, host2.example.com, host3.example.com, up to and including host10.example.com. It is also submitted to host20.example.com and the hosts between and including host30.example.com and host39.example.com.
- bsub -m
"host[10-20,30,40:50].example.com"
The job is submitted to host10.example.com, host11.example.com, host12.example.com, up to and including host20.example.com. It is also submitted to host30.example.com and the hosts between and including host40.example.com and host50.example.com.
- bsub -m
"host[01-03,05,07:09].example.com"
The job is submitted to host01.example.com, up to and including host03.example.com. It is also submitted to host05.example.com, and the hosts between and includinghost07.example.com and host09.example.com.
- bsub -m
"hostA[1-2]B[1-3,5].example.com"
The job is submitted to hostA1B1.example.com, hostA1B2.example.com, hostA1B3.example.com, hostA1B5.example.com, hostA2B1.example.com, hostA2B2.example.com, hostA2B3.example.com, and hostA2B5.example.com.
Register LSF host names and IP addresses to LSF servers
You can now register the IP and host name of your local LSF host with LSF servers so that LSF does not need to use the DNS server to resolve your local host. This addresses previous issues of resolving the host name and IP address of LSF hosts with non-static IP addresses in environments where the DNS server is not able to properly resolve these hosts after their IP addresses change.
To enable host registration, specify LSF_REG_FLOAT_HOSTS=Y in the lsf.conf file on each LSF server, or on one LSF server if all servers have access to the LSB_SHAREDIR directory. This parameter enables LSF daemons to look for records in the reghostscache file when it attempts to look up host names or IP addresses.
By default, the reghostscache file is stored in the file path as defined by the LSB_SHAREDIR parameter in the lsf.conf file. Define the LSB_SHAREDIR parameter so that the reghostscache file can be shared with as many LSF servers as possible. For all LSF servers that have access to the shared directory defined by the LSB_SHAREDIR parameter, only one of these servers needs to receive the registration request from the local host. The reghostscache file reduces network load by reducing the number of servers to which the registration request must be sent. If all hosts in the cluster can access the shared directory, the registration needs to be sent only to the master LIM. The master LIM records the host information in the shared reghostscache file that all other servers can access. If the LSB_SHAREDIR parameter is not defined, the reghostscache file is placed in the LSF_TOP directory.
MyHost1 192.168.1.2 S-1-5-21-5615612300-9789239785-9879786971
Windows hosts that register have their computer SID included as part of the record. If a registration request is received from an already registered host, but its SID does not match with the corresponding record's SID in the reghostscache file. This new registration request is rejected, which prevents malicious hosts from imitating another host's name and registering itself as another host.
After you enable host registration, you can register LSF hosts by running the lsreghost command from the local host. Specify a path to the hostregsetup file:
- On UNIX, lsreghost -s
file_path/hostregsetup
You must run the UNIX command with root privileges. If you want to register the local host at regular intervals, set up a cron job to run this command.
- On Windows, lsreghost -i
file_path\hostregsetup
The Windows command installs lsreghost as a Windows service that automatically starts up when the host starts up.
The hostregsetup file is a text file with the names of the
LSF servers to which the local host must register itself. Each line in the file
contains the host name of one LSF server. Empty lines and #comment
text are ignored.
The bmgroup command displays leased-in hosts in the resource leasing model for IBM® Spectrum LSF multicluster capability
The bmgroup command displays compute units, host groups, host names, and administrators for each group or unit. For the resource leasing model, host groups with leased-in hosts are displayed by default as allremote in the HOSTS column.
You can now expand the allremote keyword to display a list of the leased-in hosts in the host group with the bmgroup.
By default, the HOSTS column now displays a list of leased-in hosts in the form host_name@cluster_name.
For example, if cluster_1 defined a host group that is called master_hosts that contains only host_A, and a host group that is called remote_hosts with leased-in hosts as members, and cluster_2 contains host_B and host_C that are both being leased in by cluster_1:
GROUP_NAME HOSTS
master_hosts host_A
remote_hosts host_B@cluster_2 host_C@cluster_2
GROUP_NAME HOSTS
master_hosts host_A
remote_hosts allremote
RUR job accounting replaces CSA for LSF on Cray
In the LSF integration with Cray Linux, Comprehensive System Accounting (CSA) is now deprecated and replaced with Resource Utility Reporting (RUR).
- LSF_CRAY_RUR_ACCOUNTING
- Specify N to disable RUR job accounting if RUR is not enabled in your Cray environment, or to increase performance. Default value is Y (enabled).
- LSF_CRAY_RUR_DIR
- Location of the RUR data files, which is a shared file system that is accessible from any potential first execution host. Default value is LSF_SHARED_DIR/<cluster_name>/craylinux/<cray_machine_name>/rur.
- LSF_CRAY_RUR_PROLOG_PATH
- File path to the RUR prolog script file. Default value is /opt/cray/rur/default/bin/rur_prologue.py.
- LSF_CRAY_RUR_EPILOG_PATH
- File path to the RUR epilog script file. Default value is /opt/cray/rur/default/bin/rur_epilogue.py.
RUR does not support host-based resource usage (LSF_HPC_EXTENSIONS="HOST_RUSAGE").
The LSF administrator must enable RUR plug-ins, including output plug-ins, to ensure that the LSF_CRAY_RUR_DIR directory contains per-job accounting files (rur.<job_id>) or a flat file (rur.output).