What's new in IBM Spectrum LSF Version 10.1 Fix Pack 15

The following topics summarize the new and changed behavior in LSF 10.1 Fix Pack 15.

Download this Fix Pack from IBM Fix Central . For more information, see Getting fixes from IBM Fix Central.

Release date: May 2025.

Operating system versions

When a specific release or version of an operating system reaches end of life (EOL) or its end of support date, the operating system vendor no longer supports and releases updates for the product, including automatic fixes, updates, and online technical assistance.

LSF, in turn, is not tested on EOL operating systems. If you have extended support to continue to use EOL operating systems, you can use these operating system versions with LSF. Any issue with using LSF will be supported only if it can be reproduced on a supported operating system.

Before applying Fix Pack 15, ensure that you use one of the supported operating systems. Here are the highlights of LSF operating system changes as of Fix Pack 15:
  • Support for Apple macOS 13.4 and 14 on ARM64 for compute hosts.

Obsolete items no longer supported

As of Fix Pack 15, these LSF items are obsolete and no longer supported:
lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z installation package
Fix Pack 15 introduces the new linux4.18 packages. Subsequently, the linux2.6 package (lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z) is no longer available for IBM fix packs or IBM support issues.
LSF with EGO-enabled SLA scheduling
LSF with EGO-enabled SLA scheduling is no longer supported and is obsolete. You can, however, continue to configure EGO for LSF daemon management so that EGO serves as the central resource broker for LSF, and you can use EGO to run LSF daemons.
LSB_GPU_NEW_SYNTAX parameter no longer supports Y or y values
The Y or y values for the LSB_GPU_NEW_SYNTAX parameter are no longer supported and are obsolete for the lsf.conf configuration file. You can continue use LSB_GPU_NEW_SYNTAX=N|n|extend.
LSB_TIME_RESERVE_NUMJOBS parameter no longer supported
The LSB_TIME_RESERVE_NUMJOBS parameter is no longer supported in the lsf.conf file and time-based slot reservations are obsolete. For more sophisticated reservation behavior, employ plan-based reservations instead.

New linux4.18 packages for LSF on Linux x86

Fix Pack 15 introduces new LSF binary packages for Linux x86 targeting kernel version 4.18 and glibc 2.28 or later. The following linux4.18 packages are available:
Product offering Package name
IBM Spectrum LSF lsf10.1_lnx418-lib228-x86_64.tar.Z
IBM Spectrum LSF Data Manager lsf10.1_data_mgr-lnx418-x64.tar.Z
IBM Spectrum LSF License Scheduler lsf10.1_licsched_lnx418-x64.tar.Z
Install the appropriate linux4.18 package, as a fresh installation, not on top of an existing LSF installation, and note these operating system requirements for your installation:
  • The libtirpc package is required to run LSF daemons and client commands, and is required inside the container image (such as a Podman or Docker image) to be able to run LSF commands inside the container. To install the libtirpc package; for example:
    yum install libtirpc
  • The libnsl package is not required to simply run LSF daemons and client commands; it is, however, required if you use LSF's NIS integration. To install the libnsl package; for example:
    yum install libnsl

Subsequently, the linux2.6 package (lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z) is no longer available for IBM fix packs or IBM support issues.

New LSF Web Services offering

IBM Spectrum LSF Web Services (LSF Web Services) is a new offering with Fix Pack 15. Use LSF Web Services to access LSF from anywhere, making it easier for your to connect to your LSF cluster. For installation and usage, see LSF Web Services.

New lsfd.service file compatible with the lsadmin, badmin, and btrld commands

The /usr/lib/systemd/system/lsfd.service file and the child service files introduced in Fix Pack 14 for each LSF daemon are now obsolete. Fix Pack 15 introduces a new lsfd.service file that is compatible with the lsadmin, badmin, and btrld commands to start, stop, or restart the key LSF daemons (LIM, RES, and sbatchd).

If any of the key LSF daemons fail, the startup and shutdown script for LSF daemons ($LSF_SERVERDIR/lsf_daemons) now automatically restarts the necessary daemons.

Note that with the lsf_daemons script, some necessary processes run in the background, such as the lsf_daemons service_start process. Do not kill these background processes, as they are required to monitor the LIM, RES, and sbatchd daemons. If you kill a process, the lsfd.service process automatically restarts and thus, all LSF processes are killed and new LSF processes will be started with a new lsf_daemons service_start process.

Updated hwloc version for portable hardware locality

LSF now uses version 2.11.1 of the hardware locality (hwloc) library for portable hardware locality.

Configure how LSF sets the cpu.shares and cpu.weight values for jobs on hosts enabled for Linux cgroups

If the CPU cgroup subsystem is enabled for v1 or v2 Linux cgroups, and the LSF_LINUX_CGROUP_ACCT=Y setting is configured in the lsf.conf file, LSF will create a cgroup for each job on each execution host. Historically, to define CPU limits, LSF used the OS default for setting for cpu.shares (cgroups v1) and cpu.weight (cgroups v2), which meant that regardless a job's CPU intensity, it received the same share or weight of the CPU as all other running jobs.

In Fix Pack 15, LSF sets the cpu.shares and cpu.weight values to the number of cores the job will be using on the host, so that jobs that are less CPU intensive can be given a fraction of the CPU share or weight of other jobs. The new LSB_CGROUP_CPU_SHARES_OLD_DEFAULT parameter controls the default or initial value of the cpu.shares and cpu.weight cgroup interface values that LSF should consider when creating the cgroup for the job. If set it to N or n, or left as undefined, cpu.shares and cpu.weight values are relative to the number of tasks the job will run on a particular execution host.

To give jobs a percentage of the CPU share or weight of other jobs, set the new CGROUP_CPU_SHARES_FACTOR parameter in the lsb.applications and lsb.queues files (for example, set CGROUP_CPU_SHARES_FACTOR=25). The initial values for cpu.shares and cpu.weight will be scaled by this CGROUP_CPU_SHARES_FACTOR value if configured. You can view this CGROUP_CPU_SHARES_FACTOR percentage by running bapp -l or bqueues -l commands and viewing the command output.

The formula to calculate the CPU shares and weights differs based on the CGROUP_CPU_SHARES_FACTOR parameter configuration:
If CGROUP_CPU_SHARES_FACTOR is not defined
  cgroup v1 cpu.shares cgroup v2 cpu.weight
MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N 1024*(number_of_tasks_on_host/MXJ)
Otherwise 1024 100
If CGROUP_CPU_SHARES_FACTOR is defined
  cgroup v1 cpu.shares cgroup v2 cpu.weight
MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N 1024*(CGROUP_CPU_SHARES_FACTOR/100)*(number_of_tasks_on_host/MXJ)
Otherwise (CGROUP_CPU_SHARES_FACTOR/100)*1024 (CGROUP_CPU_SHARES_FACTOR/100)*100

Data compressed for all LIM communication

As of Fix Pack 15, all LIM operations and communication between the client and sever LIM are compressed using UDP as the transfer method. With this, the LSF_SEND_CONFINFO_TCP_THRESHOLD parameter has been deprecated; use the new LSF_UDP_TO_TCP_THRESHOLD parameter instead. If the package size is greater than the threshold specified for the LSF_UDP_TO_TCP_THRESHOLD parameter, then data is transferred using TCP.

Additionally, this data compression impacts LSF API program compilation and linking with LSF API libraries. If you require building using the LSF API library and Fix Pack 15, ensure that you:
  1. Install the following data compression library:
    • For RHEL: zlib-devel
    • For Ubuntu: libz-dev
  2. When linking LSF API programs with the LSF API library, add the -lz option.

Submit jobs with a cluster affinity request

You can now submit jobs with a cluster affinity request, which allows you to group jobs together by cluster. Specify the cluster affinity attribute name for the enhanced -jobaff option for the bsub command (for example, run bsub -jobaff "cluster_affinity (attribute_name)"). For more details on cluster affinity, configuration, and command usage, see Cluster affinity scheduling with attributes.

Enhanced -sla option for the bsub command

When you define the -sla (service class) option for the bsub command, and the job is forwarded, the job now retains the -sla argument in the forwarded cluster if the following are true:
  1. The service class defined exists on forwarded cluster.
  2. The job has appropriate access to the service class defined.

Both submission and execution clusters must be updated with Fix Pack 15 for this to work. See Submit a job to a service class (bsub -sla) for details on usage.

New BJOBS_W_DISPLAY_MEM_SWAP_PIDS environment variable for displaying MEM, SWAP, and PIDS details

Fix Pack 15 provides a new BJOBS_W_DISPLAY_MEM_SWAP_PIDS environment variable. Use it to manage whether to display the MEM, SWAP, and PIDS details when querying jobs with the bjobs-W command.

New -json option for the badmin rc view command

Use the new -json option to display the badmin rc view command output in JSON format.

New LSF_TCP_KEEPALIVE_TIME parameter

Configure the new LSF_TCP_KEEPALIVE_TIME parameter, in the lsf.conf configuration file, to override the Linux TCP keepalive time for sockets created by LSF daemons. After the number of seconds defined in for the LSF_TCP_KEEPALIVE_TIME value passes, the connection is marked to keep alive and to send keepalive probes.

New THREADLIMIT_PER_TASK parameter for the lsb.applications and lsb.queues files

Use the new THREADLIMIT_PER_TASK parameter for the lsb.applications and lsb.queues files. Using the THREADLIMIT_PER_TASK parameter to calculate a job's thread limit value allows you to use a more refined thread limit based on the number of tasks for the job.

The -M mem_limit option for the ssched command now sets a hard memory limit

Starting in Fix Pack 15, the -M mem_limit option for the ssched command now sets a per-process hard (not soft) memory limit for all the processes that belong to the task.

New Total column added to the badmin lsfproxyd status command output

The output of the badmin lsfproxyd status command has been enhanced to include a Total column to the displayed command metrics. The value displayed represents the total during the lifetime of the current lsfproxyd process.

Enhancements for Docker jobs

New container_io() and job_pre_post() keywords for the CONTAINER parameter to enhance Docker jobs
You can now specify LSF to write output files to the Docker container file system; set this by configuring the new container_io() keyword for the CONTAINER parameter in the lsb.applications file (for applications), lsb.queues file (for queues) , or both. If set, you can also specify another new job_pre_post() keyword, which enables LSF to run the user-level pre-execution and post-execution commands inside your Docker containers. For more information on setting up Docker jobs with your LSF cluster, see Configuring LSF to run Docker jobs.
Extended the context keyword to recognize Docker job paths
By default the PATH variable used by provided starter, controller, or monitoring scripts for the execution driver is /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin. This can be specified when configuring the EXEC_DRIVER parameter in the lsb.queues file for queues, or in the lsb.applications file for application profiles.

Fix Pack 15 extends the context keyword for the EXEC_DRIVER parameter so that it recognizes Docker jobs paths, specified by a new path keyword for the EXEC_DRIVER parameter. The use of path is optional. If use, you can specify one or more paths (separate multiple paths with a colon (:). For example:

EXEC_DRIVER=context[user(user_name) path(/path/one:/path/two)] starter[/file_path_serverdir/docker-starter.py] controller[/file_path/to/serverdir/docker-control.py] monitor[/file_path/to/serverdir/docker-monitor.py]

LSF does not check if the paths specified exist.

New LSB_DOCKER_PASS_ENV parameter for execution driver scripts to inherit the job submission user's environment
By default, if you configure the EXEC_DRIVER value for queues or application profiles, and the starter, controller, or monitoring scripts are specified, they do not automatically inherit the job submission user’s environment when started for Docker jobs. A cluster administrator can set the new LSB_DOCKER_PASS_ENV parameter to Y in the lsf.conf file, so that the execution driver (starter, controller, or monitoring) scripts inherit and use the job submission user's environment.

Forwarded pending and running jobs count towards the defined job array limits in a multicluster environment

A job array job slot limit is used to specify the maximum number of jobs submitted from a job array that is allowed to run at any one time. If job array limits are defined in a multicluster environment, forwarded jobs (in pending and running states) are counted towards the defined limit.

New MC_STRICT_JOBID_CHECKING parameter for the lsb.params file to control global job ID checking

Use the new MC_STRICT_JOBID_CHECKING parameter within the lsb.params file to control global job ID checking in an LSF multicluster environment. When set to MC_STRICT_JOBID_CHECKING=N (which is the default setting), this parameter allows the cluster to relax job ID checking on global job ID indexes. Any jobs forwarded to this cluster that do not match the configured cluster index in the lsf.shared file will default to be allowed to run on the execution host.

Enhanced dynamic host framework

Fix Pack 15 improves efficiency, performance, reliability, and supportability for the dynamic host framework within LSF, with focus on adding and removing dynamic hosts. These enhancements are especially beneficial to LSF in a cloud environment and the LSF resource connector. Enhancements for dynamic hosts include:

Schedule jobs to the proper dynamic hosts
The LSF dynamic host framework has been enhanced to ensure that jobs are always dispatched to the proper dynamic hosts.
Hosts can leave the cluster seamlessly
Similar to adding a dynamic host, you can now remove a dynamic host without the need to restart the LSF daemons, decoupling the need to have to reconfigure or restart the mbatchd daemon. The process is now seamless.

Do note, however, that if you have enabled deprecated or features no longer supported in the cluster, removing a dynamic host will trigger the need to reconfigure daemons.

New LC2_DYN_HOST log class
Use the new LC2_DYN_HOST log class to debug LIM (LSF_DEBUG_LIM), mbatchd (LSB_DEBUG_MBD), and mbschd (LSB_DEBUG_SCH) daemon issues related to dynamic hosts. For example, specify this log class as follows:
  • lsadmin limdebug -c LC2_DYN_HOST
  • badmin mbddebug -c LC2_DYN_HOST
  • badmin schddebug -c LC2_DYN_HOST
New LSF_HOST_UUID parameter for the lsf.sudoers file
Use the new LSF_HOST_UUID parameter within the lsf.sudoers file to manually override the generated UUID value for Linux hosts. The UUID allows you identify a host by more than just its hostname. For example, a UUID is used to determine if the host is the same one that previously joined, or if the host is simply reusing the same hostname or IP address (which is common for cloud- provisioned machines).
New RC_GET_HOST_METHOD parameter for the lsb.params file
When the LSF resource connector is enabled, use the new RC_GET_HOST_METHOD parameter to tell LSF the exact information (public IP address, or private IP address, or instance name) to be used for host lookup. The RC_GET_HOST_METHOD parameter provides this information efficiently, which helps improve LSF performance.

Enhanced performance when querying LSF resource connector host information

For increased efficiency and performance for the ebrokerd daemon, the daemon no longer relies on the MQTT message broker daemon (mosquito) for resource connector host information upkeep when running the bhosts -rc and bhosts -rconly commands.

Note that upon upgrading to Fix Pack 15, and you have the LSF resource connector enabled, to ensure that the bhosts -rc and bhosts -rconly commands work properly and leverage this enhancement on all applicable hosts, upgrade all management and clients to this fix pack level.

New LSF_CLOUD_UI parameter to show LSF resource connector information

Set LSF_CLOUD_UI=Y in the lsf.conf configuration file to show cloud provider details (for example, show the name of the enabled cloud provider in the lsid command output). This setting is supported for clusters with the LSF resource connector feature enabled.

New RC_TEMPLATE_BY_RESREQ_ORDER parameter to order resource connector templates

LSF decides on which LSF resource connector template (provider_templates.json file) to use based on criteria defined in the templates. LSF chooses the first template that satisfies the job:
  • The first criteria LSF uses is the priority value in the templates. LSF uses the template with the highest priority.
  • If LSF cannot choose a template based on priority (priority values are the same in the templates, or a value is not defined and defaults to a priority of 0), then:
    • Starting with Fix Pack 15, you can set the new RC_TEMPLATE_BY_RESREQ_ORDER parameter in the lsb.params configuration file, LSF looks for a template by an ordered set of resource requirements. The parameter accepts any of the numeric consumable properties defined in the attributes section of the resource connector template.
    • If you do not have the RC_TEMPLATE_BY_RESREQ_ORDER parameter configured, then LSF moves to the templateId value in the templates, and selects a template by alphabetical order.

New LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter for the LSF resource connector to control the time to wait before timing out hosts in an abnormal state

To allow the LSF resource connector to control the number of minutes that LSF waits before timing out resource connector hosts that are in an abnormal state (that is, closed_LIM, unavail, or unreach status), add the new LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter to the lsf.conf configuration file. The LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME parameter sets the timeout values and LSF behavior when a resource connector host reaches soft and hard timeout limits. Specify integers for the values, and separate the values with a colon (:). For details and usage syntax, see LSB_RC_EXTERNAL_HOST_ABNORMAL_TIME.

New PROV state for the bjobs command and resource connector jobs

A new PROV state has been added for the bjobs command output for jobs submitted from clusters enabled with the LSF resource connector and with the new LSF_CLOUD_UI=Y lsf.conf configuration. The PROV state shows that a job is requesting a resource (host) from the LSF resource connector cloud provider, and is waiting for that host to join the LSF cluster. Additionally, the bjobs -l output for a provision job now include a message that indicates when a request has been sent to cloud provider; for example:
Wed Jan 31 18:43:28: Sent resource connector request for 1*CENTOS-Template-NGVM-1 from provider ibmcloudgen2.

New ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE parameter for the googleprov_config.json file

The new ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE parameter for the googleprov_config.json (LSF resource connector Google Compute Cloud) configuration file enables LSF to assign the service account from the launchTemplateId attribute (defined in the googleprov_templates.json file) to newly created instances. If ASSIGN_SERVICE_ACCOUNT_FROM_LAUNCH_TEMPLATE is set to true, for each template in the googleprov_templates.json file, LSF assigns the service account defined in the launchTemplateId to newly created instances, if the launchTemplateId is defined in the template. Otherwise, LSF assigns the default service account to the newly created instances. The default parameter value is false.

New AWS_TAG_InstanceID and AWS_ENDPOINT_URL parameters for the awsprov_config.json file

The new AWS_TAG_InstanceID parameter for the awsprov_config.json (LSF resource connector AWS) configuration file allows you to adjust performance by controlling InstanceID tagging. If set to true, the AWS LSF resource connector plug-in will add the InstanceID tag to both the instance and its EBS volumes. By default, this parameter is set to false, so that the AWS LSF resource connector plug-in will not add the InstanceID tag, which helps with the AWS plug-in performance.

The new AWS_ENDPOINT_URL parameter for the awsprov_config.json file supports private endpoints. Typically, the AWS provider can determine the endpoint from the region name, but if the provider is running in an environment that does not provide public internet access, this endpoint is unreachable. In this case, you can create a private endpoint (a non-default API endpoint URL used for requesting hosts through AWS); refer to the AWS documentation for reference on creating this endpoint using the Amazon VPC console. Once created, you can then specify that endpoint for the AWS_ENDPOINT_URL parameter.

New computeUnit host attribute for the awsprov_templates.json and ibmcloudgen2_templates.json files

The new computeUnit host attribute is available for the awsprov_templates.json and ibmcloudgen2_templates.json files. Use this attribute to specify any compute unit requirements for the LSF resource connector to provision.

New MinNumber parameter for the policy_config.json file

The policy_config.json file now supports a new MinNumber parameter (within the Policies parameter). Use this to define the minimum number of instances per resource connector template, account, and provider that will be created automatically even if there is no job request. The cluster maintains these minimums even if there is no job.

The ssacct command displays accounting statistics immediately for LSF Session Scheduler jobs

The IBM Spectrum LSF Session Scheduler enables users to run large collections of tasks within the allocation of a single LSF job. The ssacct command displays accounting statistics about finished LSF Session Scheduler jobs. Previously, the command waited for all tasks in the job to complete before showing these statistics. With Fix Pack 15, the command has been enhanced to show the statistics immediately, without having to wait for the tasks to finish.

Enhanced -Q option for the ssched command to easily re-queue all failed LSF Session Scheduler tasks, with specific exceptions

The ssched command allows you to submit tasks through the IBM Spectrum LSF Session Scheduler. The -Q option for this command enables automatic re-queuing of failed tasks, and has been enhanced in Fix Pack 15. Previously, to re-queue failed tasks, you specified individual task exist codes, separating each code with a space, which can be a long manual list if you have multiple failed tasks. With Fix Pack 15, the command supports the -Q "all [~exit_code ...] | exit_code ..." syntax so that you can batch re-queue all failed tasks, except for the task exit codes you explicitly specify with the tilde (~) syntax.

New ssview command to show task details and new LSB_REPORT_SSJOB_FAILURE parameter to report task failures for LSF Session Scheduler jobs

Leverage the new ssview command to show task details of LSF Session Scheduler jobs. This command is equivalent to running the bjobs command for LSF job.

Additionally configuring the new LSB_REPORT_SSJOB_FAILURE parameter in the lsf.conf file to Y enables the ssview command to obtain more accurate task status information for LSF Session Scheduler jobs, as this parameter reports the reason that tasks have been killed or run unsuccessfully.

New LM_STAT_PER_FEATURE parameter to controls retrieving license usage data from FlexNet license servers for LSF License Scheduler jobs

Configure the LM_STAT_PER_FEATURE parameter in the lsf.licensescheduler file to allow LSF License Scheduler to manage a small set of licenses from a license server that handles hundreds or thousands of licenses. The LM_STAT_PER_FEATURE parameter controls how the blcollect (license collector daemon) command retrieves license usage data from FlexNet license servers for LSF License Scheduler jobs.