Known issues and limitations

LSF License Scheduler 10.1 has the following known issues and limitations.

Cannot get license information from license server if the protocol connection hangs during the timeout period

When the timeout period as specified in the LM_STAT_TIMEOUT parameter is reached and the remote agent host still does not get a response from ssh, rsh, or the lsrun command, the blcollect command cannot get any license information.

To work around this issue, specify REMOTE_LMSTAT_PROTOCOL=ssh -o ConnectTimeout=value in lsf.licensescheduler to use the ssh protocol timeout.

Incorrect license count if the job is suspended but does not release licenses

In project mode, if the job is suspended but does not release licenses, the license count might be wrong.

To work around this issue, specify the LM_REMOVE_SUSP_JOBS parameter in the lsf.licensescheduler file to use the lmremove or rlmremove commands to remove licenses from the application.

License usage count larger than actual total when a license feature that is different from the job request

In project mode, the license usage count of a feature might be larger than the actual total number. This inaccurate count occurs if some licenses of one feature are reserved, then you submit a job that requests another feature but the job checks out the previous feature. When the job runs, the total license use count might be larger than the actual total number.

License usage is incorrect in the bhosts -s command

Token usage is lost after dynamic usage is triggered.

This loss occurs in LSF License Scheduler project mode, the ENABLE_DYNAMIC_RUSAGE parameter is enabled for the feature, and two jobs that are owned by the same user are running on same execution host. The following example illustrates this problem:
  1. User submits a LSF License Scheduler job to run on hostA. The job requests the LSF License Scheduler feature f1 in rusage, but checks out f2 in the job script.
  2. The first job is stopped and the same user submits another LSF License Scheduler job that requests the feature f2. This job also runs on hostA.
  3. When the first job is resumed, the problem occurs. The blstat and bhosts -s commands have inconsistent feature tokens usage for the f2 feature. The license usage that is displayed in the bhosts -s command is incorrect.

Jobs submitted without -b might have an incorrect pending reason

If two LSF License Scheduler jobs request the same license feature where one is submitted with the -b option and the other one is submitted without the -b option. The pending reason of the job that is submitted with the -b option might be incorrect.

Hierarchical group paths require LSF, Version 9.1.1, or later

To use hierarchical project group paths (by defining the PROJECT_GROUP_PATH parameter in the lsf.licensescheduler file), you need LSF, Version 9.1.1, or later.

lmstat is not included with LSF License Scheduler

The lmstat (or lmutil lmstat) command is no longer included with LSF License Scheduler. This command is included with FlexNet, and is usually in the /etc/flexlm/bin directory.

When single job requires more tokens than the allocation buffer

In project mode, you must make sure that you set the DEMAND_LIMIT parameter to a value greater than the expected maximum number of license tokens that are required by any single job.

In cluster mode, you must make sure that you set the allocation buffer for dynamic distribution of licenses greater than the expected maximum number of license tokens that are required by any single job.

Jobs that use more than one feature trigger preemption

In project mode, when a job that uses more than one feature triggers a preemption, an over-preemption might occur. For example, only one job needs to be preempted, but the bld command preempts two or more jobs.

To work around this issue, use project mode instead of project mode.

Released license tokens are reserved again after restart

For license features with DYNAMIC=Y enabled and a duration is specified in a resource string, after the bld or mbatchd daemons are restarted (with the badmin reconfig, badmin mbdrestart or bladmin reconfig commands), released licenses are reserved again for the specified duration.

Set file descriptor limit large enough

Make sure that the operating system file descriptor limit is large enough to support all taskman tasks, LSF License Scheduler (bl*) commands, and connections between LSF License Scheduler and LSF. Use LS_MAX_TASKMAN_SESSIONS in lsf.licensescheduler to define the maximum number of taskman jobs that can run simultaneously.

Installation

When you install LSF License Scheduler stand-alone, the installer removes EGO environment variables from the cshrc.lsf and profile.lsf files. Specify a different LSF_TOP directory from the LSF installation to install stand-alone LSF License Scheduler.

Preemption

If the JOB_CONTROLS parameter in the lsb.queues file is defined so that job controls (such as the signal SIGTSTP) take effect when LSF License Scheduler preemption occurs, the LIC_SCHED_PREEMPT_STOP=Y parameter in the lsf.conf file must also be defined for LSF License Scheduler preemption to work.

Theoretical limit for license utilization

IBM® Spectrum LSF License Scheduler is often held up as a license utilization optimization engine. Unfortunately application behavior and interaction with a license server can limit the maximum theoretical utilization that a business can meet.

Managing licenses has complex interdependencies and behaviors. When a job starts, it does NOT immediately check out a license and hold that license for the duration of the job execution. Applications frequently check out a license after the application is started and do not keep it until the job ends. Some applications even do multiple license checkout/in during a single run.

In project mode, if you do not use the ENABLE_DYNAMIC_RUSAGE and DYNAMIC parameters in LSF License Scheduler, the time an application runs without a license that is checked out is lost license utilization. LSF License Scheduler holds the licenses in a RESERVED state. When you check the license server with the lmstat command, it looks like keys are unallocated but no additional jobs are dispatched.

Due to unpredictably and license model complexity, loss of license utilization is a fact of license management when license checkout time is not identical to application execution time. Resolve this issue by using cluster mode, or by enabling FAST_DISPATCH when you use project mode.

The feature.servicedomain.dat file grows too big with large-scale configuration

If, for example, you configure LSF License Scheduler with 500 features, 100 projects, and 50 service domains, LSF License Scheduler records information into data files every minute, causing potential performance issues.

LSF License Scheduler preempted job not redispatched before pending jobs

License preemption is based in part on accumulated in use tokens. Since preempted jobs might already have accumulated in use time, new pending jobs might be dispatched first.

brun job is preempted by LSF License Scheduler and resumed by mbschd

In project mode, if a brun job is preempted by another job with project ownership, the brun job runs again. This pattern repeats until the brun job is done.

Incorrect output for hierarchical fair share among Project Groups

When LSF License Scheduler hierarchical fair share is configured, running blinfo without the -G option displays incorrect share information. The same error occurs in output from the blparams and blinfo -p commands.

Same features cannot be merged to the tasks

When a user submits more than one job on the same host, LSF License Scheduler distributes jobs to each job in a round robin fashion, which gives each job at most one license for a feature that checks out licenses in excess of rusage.

LSF License Scheduler matches license checkouts to jobs based on the user, host, and feature in the checkout. If a user runs multiple jobs on the same host, a checkout might be merged with the wrong job.

Freed licenses are reserved after bladmin reconfig

After taskman reconnects to the bld daemon (after the bladmin reconfig command is used), a taskman job start time is the current time not the previous taskman started time. The bld daemon reserves token for the duration.