What's new in IBM Spectrum LSF Version 10.1 Fix Pack 15
The following topics summarize the new and changed behavior in LSF 10.1 Fix Pack 15.
Download this Fix Pack from IBM Fix Central . For more information, see Getting fixes from IBM Fix Central.
Release date: May 2025.
Operating system versions
When a specific release or version of an operating system reaches end of life (EOL) or its end of support date, the operating system vendor no longer supports and releases updates for the product, including automatic fixes, updates, and online technical assistance.
LSF, in turn, is not tested on EOL operating systems. If you have extended support to continue to use EOL operating systems, you can use these operating system versions with LSF. Any issue with using LSF will be supported only if it can be reproduced on a supported operating system.
- Support for Apple macOS 13.4 and 14 on ARM64 for compute hosts.
Obsolete items no longer supported
- lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z installation package
- Fix Pack 15 introduces the new linux4.18 packages. Subsequently, the linux2.6 package (lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z) is no longer available for IBM fix packs or IBM support issues.
- LSF with EGO-enabled SLA scheduling
- LSF with EGO-enabled SLA scheduling is no longer supported and is obsolete. You can, however, continue to configure EGO for LSF daemon management so that EGO serves as the central resource broker for LSF, and you can use EGO to run LSF daemons.
- LSB_GPU_NEW_SYNTAX parameter no longer supports Y or y values
- The Y or y values for the LSB_GPU_NEW_SYNTAX parameter are no longer supported and are obsolete for the lsf.conf configuration file. You can continue use LSB_GPU_NEW_SYNTAX=N|n|extend.
- LSB_TIME_RESERVE_NUMJOBS parameter no longer supported
- The LSB_TIME_RESERVE_NUMJOBS parameter is no longer supported in the lsf.conf file and time-based slot reservations are obsolete. For more sophisticated reservation behavior, employ plan-based reservations instead.
New linux4.18 packages for LSF on Linux x86
Product offering | Package name |
---|---|
IBM Spectrum LSF | lsf10.1_lnx418-lib228-x86_64.tar.Z |
IBM Spectrum LSF Data Manager | lsf10.1_data_mgr-lnx418-x64.tar.Z |
IBM Spectrum LSF License Scheduler | lsf10.1_licsched_lnx418-x64.tar.Z |
- The libtirpc package is required to run LSF
daemons and client commands, and is required inside the container image (such as a Podman or Docker
image) to be able to run LSF
commands inside the container.
To install the
libtirpc package; for example:
yum install libtirpc
- The libnsl package is not required to simply run LSF
daemons and client commands; it is, however, required if you use LSF's NIS
integration. To install the libnsl package; for
example:
yum install libnsl
Subsequently, the linux2.6 package (lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z) is no longer available for IBM fix packs or IBM support issues.
New LSF Web Services offering
IBM Spectrum LSF Web Services (LSF Web Services) is a new offering with Fix Pack 15. Use LSF Web Services to access LSF from anywhere, making it easier for your to connect to your LSF cluster. For installation and usage, see LSF Web Services.
New lsfd.service file compatible with the lsadmin, badmin, and btrld commands
The /usr/lib/systemd/system/lsfd.service file and the child service files
introduced in Fix Pack 14 for each LSF daemon
are now obsolete. Fix Pack 15 introduces a new lsfd.service file that is
compatible with the lsadmin, badmin, and
btrld commands to start, stop, or restart the key LSF
daemons (LIM
, RES
, and sbatchd
).
If any of the key LSF daemons fail, the startup and shutdown script for LSF daemons ($LSF_SERVERDIR/lsf_daemons) now automatically restarts the necessary daemons.
Note that with the lsf_daemons script, some necessary processes run in the
background, such as the lsf_daemons service_start process. Do not kill these
background processes, as they are required to monitor the LIM
,
RES
, and sbatchd
daemons. If you kill a process, the
lsfd.service process automatically restarts and thus, all LSF
processes are killed and new LSF
processes will be started with a new lsf_daemons service_start process.
Updated hwloc version for portable hardware locality
LSF now uses version 2.11.1 of the hardware locality (hwloc) library for portable hardware locality.
Configure
how LSF sets
the cpu.shares
and cpu.weight
values for jobs on hosts enabled for
Linux cgroups
If the CPU cgroup subsystem is enabled for v1 or v2 Linux
cgroups, and the LSF_LINUX_CGROUP_ACCT=Y setting is configured in the
lsf.conf file, LSF will
create a cgroup for each job on each execution host. Historically, to define CPU limits, LSF used
the OS default for setting for cpu.shares
(cgroups v1) and
cpu.weight
(cgroups v2), which meant that regardless a job's CPU
intensity, it received the same share or weight of the CPU as all other running jobs.
In Fix Pack 15, LSF sets
the cpu.shares
and cpu.weight
values to the number of cores the
job will be using on the host, so that jobs that are less CPU intensive can be given a fraction of
the CPU share or weight of other jobs. The new LSB_CGROUP_CPU_SHARES_OLD_DEFAULT parameter controls the default or
initial value of the cpu.shares
and cpu.weight
cgroup interface
values that LSF should
consider when creating the cgroup for the job. If set it to N or
n, or left as undefined, cpu.shares
and
cpu.weight
values are relative to the number of tasks the job will run on a
particular execution host.
To give jobs a percentage of the CPU share or weight of other jobs, set the new
CGROUP_CPU_SHARES_FACTOR parameter in the lsb.applications and lsb.queues files (for example, set
CGROUP_CPU_SHARES_FACTOR=25). The initial values for
cpu.shares
and cpu.weight
will be scaled by this
CGROUP_CPU_SHARES_FACTOR value if configured. You can view this
CGROUP_CPU_SHARES_FACTOR percentage by running bapp
-l or bqueues -l commands and viewing the command output.
- If CGROUP_CPU_SHARES_FACTOR is not defined
-
cgroup v1 cpu.shares cgroup v2 cpu.weight MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N 1024*(number_of_tasks_on_host/MXJ)
Otherwise 1024 100 - If CGROUP_CPU_SHARES_FACTOR is defined
-
cgroup v1 cpu.shares cgroup v2 cpu.weight MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N 1024*(CGROUP_CPU_SHARES_FACTOR/100)*(number_of_tasks_on_host/MXJ)
Otherwise (CGROUP_CPU_SHARES_FACTOR/100)*1024
(CGROUP_CPU_SHARES_FACTOR/100)*100
Data compressed for all LIM communication
As of Fix Pack 15, all LIM operations and communication between the client and sever LIM are compressed using UDP as the transfer method. With this, the LSF_SEND_CONFINFO_TCP_THRESHOLD parameter has been deprecated; use the new LSF_UDP_TO_TCP_THRESHOLD parameter instead. If the package size is greater than the threshold specified for the LSF_UDP_TO_TCP_THRESHOLD parameter, then data is transferred using TCP.
- Install the following data compression library:
- For RHEL: zlib-devel
- For Ubuntu: libz-dev
- When linking LSF API programs with the LSF API library, add the -lz option.
Submit jobs with a cluster affinity request
You can now submit jobs with a cluster affinity request, which allows you to group jobs together by cluster. Specify the cluster affinity attribute name for the enhanced -jobaff option for the bsub command (for example, run bsub -jobaff "cluster_affinity (attribute_name)"). For more details on cluster affinity, configuration, and command usage, see Cluster affinity scheduling with attributes.
Enhanced -sla option for the bsub command
- The service class defined exists on forwarded cluster.
- The job has appropriate access to the service class defined.
Both submission and execution clusters must be updated with Fix Pack 15 for this to work. See Submit a job to a service class (bsub -sla) for details on usage.
New
BJOBS_W_DISPLAY_MEM_SWAP_PIDS environment variable for displaying
MEM
, SWAP
, and PIDS
details
Fix Pack 15 provides a new BJOBS_W_DISPLAY_MEM_SWAP_PIDS
environment variable. Use it to manage
whether to display the MEM
, SWAP
, and PIDS
details when querying jobs with the bjobs-W command.
New -json option for the badmin rc view command
Use the new -json option to display the badmin rc view command output in JSON format.
New LSF_TCP_KEEPALIVE_TIME parameter
Configure the new LSF_TCP_KEEPALIVE_TIME parameter, in the lsf.conf configuration file, to override the Linux TCP keepalive time for sockets created by LSF daemons. After the number of seconds defined in for the LSF_TCP_KEEPALIVE_TIME value passes, the connection is marked to keep alive and to send keepalive probes.
New THREADLIMIT_PER_TASK parameter for the lsb.applications and lsb.queues files
Use the new THREADLIMIT_PER_TASK parameter for the lsb.applications and lsb.queues files. Using the THREADLIMIT_PER_TASK parameter to calculate a job's thread limit value allows you to use a more refined thread limit based on the number of tasks for the job.
The -M mem_limit option for the ssched command now sets a hard memory limit
Starting in Fix Pack 15, the -M mem_limit option for the ssched command now sets a per-process hard (not soft) memory limit for all the processes that belong to the task.
New
Total column added to the badmin lsfproxyd status
command
output
The output of the badmin lsfproxyd status
command has been enhanced to include a
Total column to the displayed command metrics. The value displayed represents
the total during the lifetime of the current lsfproxyd process.
Enhancements for Docker jobs
- New container_io() and job_pre_post() keywords for the CONTAINER parameter to enhance Docker jobs
- You can now specify LSF to write output files to the Docker container file system; set this by configuring the new container_io() keyword for the CONTAINER parameter in the lsb.applications file (for applications), lsb.queues file (for queues) , or both. If set, you can also specify another new job_pre_post() keyword, which enables LSF to run the user-level pre-execution and post-execution commands inside your Docker containers. For more information on setting up Docker jobs with your LSF cluster, see Configuring LSF to run Docker jobs.
- Extended the context keyword to recognize Docker job paths
- By default the PATH variable used by provided starter, controller, or monitoring scripts for the
execution driver is
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
. This can be specified when configuring the EXEC_DRIVER parameter in the lsb.queues file for queues, or in the lsb.applications file for application profiles.Fix Pack 15 extends the context keyword for the EXEC_DRIVER parameter so that it recognizes Docker jobs paths, specified by a new path keyword for the EXEC_DRIVER parameter. The use of path is optional. If use, you can specify one or more paths (separate multiple paths with a colon (:). For example:
EXEC_DRIVER=context[user(user_name) path(/path/one:/path/two)] starter[/file_path_serverdir/docker-starter.py] controller[/file_path/to/serverdir/docker-control.py] monitor[/file_path/to/serverdir/docker-monitor.py]
LSF does not check if the paths specified exist.
- New LSB_DOCKER_PASS_ENV parameter for execution driver scripts to inherit the job submission user's environment
- By default, if you configure the EXEC_DRIVER value for queues or application profiles, and the starter, controller, or monitoring scripts are specified, they do not automatically inherit the job submission user’s environment when started for Docker jobs. A cluster administrator can set the new LSB_DOCKER_PASS_ENV parameter to Y in the lsf.conf file, so that the execution driver (starter, controller, or monitoring) scripts inherit and use the job submission user's environment.
Forwarded pending and running jobs count towards the defined job array limits in a multicluster environment
A job array job slot limit is used to specify the maximum number of jobs submitted from a job array that is allowed to run at any one time. If job array limits are defined in a multicluster environment, forwarded jobs (in pending and running states) are counted towards the defined limit.
New MC_STRICT_JOBID_CHECKING parameter for the lsb.params file to control global job ID checking
Use the new MC_STRICT_JOBID_CHECKING parameter within the lsb.params file to control global job ID checking in an LSF multicluster environment. When set to MC_STRICT_JOBID_CHECKING=N (which is the default setting), this parameter allows the cluster to relax job ID checking on global job ID indexes. Any jobs forwarded to this cluster that do not match the configured cluster index in the lsf.shared file will default to be allowed to run on the execution host.
Enhanced dynamic host framework
Fix Pack 15 improves efficiency, performance, reliability, and supportability for the dynamic host framework within LSF, with focus on adding and removing dynamic hosts. These enhancements are especially beneficial to LSF in a cloud environment and the LSF resource connector. Enhancements for dynamic hosts include:
- Schedule jobs to the proper dynamic hosts
- The LSF dynamic host framework has been enhanced to ensure that jobs are always dispatched to the proper dynamic hosts.
- Hosts can leave the cluster seamlessly
- Similar to adding a dynamic host, you can now remove a dynamic host without the need to restart
the LSF
daemons, decoupling the need to have to reconfigure or restart the mbatchd
daemon. The process is now seamless.
Do note, however, that if you have enabled deprecated or features no longer supported in the cluster, removing a dynamic host will trigger the need to reconfigure daemons.
- New LC2_DYN_HOST log class
- Use the new LC2_DYN_HOST log class to debug LIM (LSF_DEBUG_LIM),
mbatchd
(LSB_DEBUG_MBD), andmbschd
(LSB_DEBUG_SCH) daemon issues related to dynamic hosts. For example, specify this log class as follows:- lsadmin limdebug -c LC2_DYN_HOST
- badmin mbddebug -c LC2_DYN_HOST
- badmin schddebug -c LC2_DYN_HOST
- New LSF_HOST_UUID parameter for the lsf.sudoers file
- Use the new LSF_HOST_UUID parameter within the lsf.sudoers file to manually override the generated UUID value for Linux hosts. The UUID allows you identify a host by more than just its hostname. For example, a UUID is used to determine if the host is the same one that previously joined, or if the host is simply reusing the same hostname or IP address (which is common for cloud- provisioned machines).
- New RC_GET_HOST_METHOD parameter for the lsb.params file
- When the LSF resource connector is enabled, use the new RC_GET_HOST_METHOD parameter to tell LSF the exact information (public IP address, or private IP address, or instance name) to be used for host lookup. The RC_GET_HOST_METHOD parameter provides this information efficiently, which helps improve LSF performance.
Enhanced performance when querying LSF resource connector host information
For increased efficiency and performance for the ebrokerd daemon, the daemon no longer relies on the MQTT message broker daemon (mosquito) for resource connector host information upkeep when running the bhosts -rc and bhosts -rconly commands.
Note that upon upgrading to Fix Pack 15, and you have the LSF resource connector enabled, to ensure that the bhosts -rc and bhosts -rconly commands work properly and leverage this enhancement on all applicable hosts, upgrade all management and clients to this fix pack level.
New LSF_CLOUD_UI parameter to show LSF resource connector information
Set LSF_CLOUD_UI=Y in the lsf.conf configuration file to show cloud provider details (for example, show the name of the enabled cloud provider in the lsid command output). This setting is supported for clusters with the LSF resource connector feature enabled.
New RC_TEMPLATE_BY_RESREQ_ORDER parameter to order resource connector templates
- The first criteria LSF uses is the priority value in the templates. LSF uses the template with the highest priority.
- If LSF cannot
choose a template based on priority (priority values are
the same in the templates, or a value is not defined and defaults to a priority of
0
), then:- Starting with Fix Pack 15, you can set the new RC_TEMPLATE_BY_RESREQ_ORDER parameter in the
lsb.params configuration file, LSF looks
for a template by an ordered set of resource requirements. The parameter accepts any of the numeric
consumable properties defined in the
attributes
section of the resource connector template. - If you do not have the RC_TEMPLATE_BY_RESREQ_ORDER parameter configured, then LSF moves to the templateId value in the templates, and selects a template by alphabetical order.
- Starting with Fix Pack 15, you can set the new RC_TEMPLATE_BY_RESREQ_ORDER parameter in the
lsb.params configuration file, LSF looks
for a template by an ordered set of resource requirements. The parameter accepts any of the numeric
consumable properties defined in the
New LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter for the LSF resource connector to control the time to wait before timing out hosts in an abnormal state
To allow the LSF
resource connector to control the number of minutes that LSF waits
before timing out resource connector hosts that are in an abnormal state (that
is, closed_LIM
, unavail
, or unreach
status), add
the new LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter to the
lsf.conf configuration file. The
LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter sets the timeout values and
LSF behavior when a resource connector host reaches soft and hard timeout limits. Specify integers
for the values, and separate the values with a colon (:). For details and usage syntax, see LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME.
New
PROV
state for the bjobs command and resource connector
jobs
Wed Jan 31 18:43:28: Sent resource connector request for 1*CENTOS-Template-NGVM-1 from provider ibmcloudgen2.
New ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE parameter for the googleprov_config.json file
The new ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE parameter for the googleprov_config.json (LSF
resource connector Google Compute Cloud) configuration file enables LSF to
assign the service account from the launchTemplateId
attribute (defined in the
googleprov_templates.json
file) to newly created instances. If
ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE is set to true
,
for each template in the googleprov_templates.json file, LSF
assigns the service account defined in the launchTemplateId
to newly created
instances, if the launchTemplateId
is defined in the template. Otherwise, LSF
assigns the default service account to the newly created instances. The default parameter value is
false
.
New AWS_TAG_InstanceID and AWS_ENDPOINT_URL parameters for the awsprov_config.json file
The new AWS_TAG_InstanceID parameter for the awsprov_config.json (LSF resource connector AWS) configuration file allows you to adjust performance by controlling InstanceID tagging. If set to true, the AWS LSF resource connector plug-in will add the InstanceID tag to both the instance and its EBS volumes. By default, this parameter is set to false, so that the AWS LSF resource connector plug-in will not add the InstanceID tag, which helps with the AWS plug-in performance.
The new AWS_ENDPOINT_URL parameter for the awsprov_config.json file supports private endpoints. Typically, the AWS provider can determine the endpoint from the region name, but if the provider is running in an environment that does not provide public internet access, this endpoint is unreachable. In this case, you can create a private endpoint (a non-default API endpoint URL used for requesting hosts through AWS); refer to the AWS documentation for reference on creating this endpoint using the Amazon VPC console. Once created, you can then specify that endpoint for the AWS_ENDPOINT_URL parameter.
New computeUnit host attribute for the awsprov_templates.json and ibmcloudgen2_templates.json files
The new computeUnit host attribute is available for the awsprov_templates.json and ibmcloudgen2_templates.json files. Use this attribute to specify any compute unit requirements for the LSF resource connector to provision.
New MinNumber parameter for the policy_config.json file
The policy_config.json file now supports a new MinNumber parameter (within the Policies parameter). Use this to define the minimum number of instances per resource connector template, account, and provider that will be created automatically even if there is no job request. The cluster maintains these minimums even if there is no job.
The ssacct command displays accounting statistics immediately for LSF Session Scheduler jobs
The IBM Spectrum LSF Session Scheduler enables users to run large collections of tasks within the allocation of a single LSF job. The ssacct command displays accounting statistics about finished LSF Session Scheduler jobs. Previously, the command waited for all tasks in the job to complete before showing these statistics. With Fix Pack 15, the command has been enhanced to show the statistics immediately, without having to wait for the tasks to finish.
Enhanced -Q option for the ssched command to easily re-queue all failed LSF Session Scheduler tasks, with specific exceptions
The ssched command allows you to submit tasks through the IBM Spectrum LSF Session Scheduler. The -Q option for this command enables automatic re-queuing of failed tasks, and has been enhanced in Fix Pack 15. Previously, to re-queue failed tasks, you specified individual task exist codes, separating each code with a space, which can be a long manual list if you have multiple failed tasks. With Fix Pack 15, the command supports the -Q "all [~exit_code ...] | exit_code ..." syntax so that you can batch re-queue all failed tasks, except for the task exit codes you explicitly specify with the tilde (~) syntax.
New ssview command to show task details and new LSB_REPORT_SSJOB_FAILURE parameter to report task failures for LSF Session Scheduler jobs
Leverage the new ssview command to show task details of LSF Session Scheduler jobs. This command is equivalent to running the bjobs command for LSF job.
Additionally configuring the new LSB_REPORT_SSJOB_FAILURE parameter in the lsf.conf file to Y enables the ssview command to obtain more accurate task status information for LSF Session Scheduler jobs, as this parameter reports the reason that tasks have been killed or run unsuccessfully.
New LM_STAT_PER_FEATURE parameter to controls retrieving license usage data from FlexNet license servers for LSF License Scheduler jobs
Configure the LM_STAT_PER_FEATURE parameter in the lsf.licensescheduler file to allow LSF License Scheduler to manage a small set of licenses from a license server that handles hundreds or thousands of licenses. The LM_STAT_PER_FEATURE parameter controls how the blcollect (license collector daemon) command retrieves license usage data from FlexNet license servers for LSF License Scheduler jobs.