Managing floating software licenses in LSF

Typically, a pool of floating software licenses is represented by numeric resources in LSF. Every job that requires licenses must include the license requirement in its rusage expression to ensure that enough licenses are free for the job at the time that the job is dispatched.

About this task

There are three recommended approaches to represent a pool of licenses as LSF resources:

  • Manage licenses as static numeric shared resources
  • Manage licenses as dynamic numeric shared resources
  • Use LSF License Scheduler Basic Edition to manage license resources

Manage licenses as static numeric shared resources

Manually encode the total number of licenses that are available to jobs in the cluster in the LSF configuration.

About this task

For each job that requires licenses, specify the requirement in its rusage expression without specifying a duration. This configures LSF to reserve licenses for the job for as long as it is running.

Do not use this approach if there are multiple clusters that share a common pool of licenses. One cluster does not know the number of licenses that are checked out or reserved for jobs in other clusters, which can lead to license checkout failures.

Procedure

  1. Configure a static shared resource in the lsf.shared file to represent the license.
    Begin Resource
    RESOURCENAME  TYPE     INTERVAL  INCREASING  DESCRIPTION
    ...
    lic1          Numeric  ()        N           (application license)
    End Resource
  2. Configure a single instance of the license shared by all hosts in the lsf.cluster file.

    Set the amount equal to the total licenses available to the cluster.

    Begin ResourceMap
     RESOURCENAME  LOCATION
     lic1          (10@[all])
    End ResourceMap

    After making this change, you must restart LIM and mbatchd on the LSF management hosts by running the bctrld restart lim and badmin mbdrestart commands.

  3. When submitting a job that will check out some number of the licenses, specify this requirement in the rusage expression.

    In the following example, job_script will checks out 1 unit of the lic1 license.

    bsub –R "rusage[lic1=1]" job_script

What to do next

One disadvantage of using a static resource to represent a license is that you must reconfigure LSF whenever the total number of licenses changes. To avoid this issue, write an elim that polls license servers to determine the total licenses in the pool (that is, the number of licenses currently available plus the number in use).

For more information on writing an elim, refer to External load indices

Using a static resource to represent a license only approximates the real license availability. If an application checks out a license outside of an LSF job, or an LSF job checks out more licenses than specified in its rusage, LSF might dispatch a job that requires licenses even though no licenses are available. As a result, the job either fails or occupies allocated compute resources while waiting for licenses to become available.

Manage licenses as dynamic numeric shared resources

An alternative to the static approach is to write an elim that periodically collects the availability of free licenses from license servers, and reports this number to LSF.

About this task

As with static approach, a job that requires the license requests it in the job rusage expression. However, you should also specify a duration (generally of a few minutes) for the license rusage. LSF reserves licenses for the job for this specified duration, starting from the time that the job is first dispatched. Specify a duration that is long enough that the job can check out the license from the license server before the duration expires.

In the following example, there are two units of a license feature. You want to run a job that requires one of them. The job submission and license checkout process is as follows:

  1. Submit the job to LSF, specifying one unit of the license with a duration in the job’s rusage expression.
  2. The elim reports to LSF that there are two licenses free.
  3. LSF dispatches the job, and internally reserves one license for the job, to avoid dispatching another job that will contend for the license.
  4. The job checks out one license from the license server.
  5. Upon polling the license server, elim detects that there is now only one license free.
  6. The rusage duration expires, and LSF stops reserving a license for the job.

At this point, elim is reporting only one license available. When the job checks back in the license and elim detects that the license is free, the license can then be used by another job.

Procedure

  1. Configure a dynamic numeric shared resource in the lsf.shared file to represent the application licenses.
    Begin Resource
    RESOURCENAME  TYPE     INTERVAL  INCREASING  DESCRIPTION
    ...
    lic2          Numeric  15        N           (application licenses)
    End Resource
  2. Map the external resource to all hosts in the ResourceMap section of the lsf.cluster file. Do not specify an amount for the resource.
    Begin ResourceMap
     RESOURCENAME  LOCATION
     lic2           ([all])
    End ResourceMap
  3. Create an elim executable file that collects available licenses from license server.

    For more information on writing an elim, refer to External load indices

  4. For each job that checks out the license, specify the amount in the rusage section of the job's resource requirement expression.

    Set a duration, in minutes, for the license in the rusage expression. LSF reserves the resource for the job until the duration expires. Use a duration that is long enough to let the job to check out its licenses and for elim to subsequently poll the license servers.

    bsub –R "rusage[lic2=1:duration=5]" job_script

What to do next

If you have multiple clusters sharing a common pool of licenses, each cluster should have its own independent elim collecting licenses from the pool. Generally, the dynamic approach gives fewer license checkout failures than with the static approach. This is because once a job in one cluster has checked out a license, the license checkout is reflected in the reduced resource availability reported by the elims in other clusters.

You must set an appropriate rusage duration. If it is too short, then LSF stops reserving before the job has checked out a license. The result is that there might not be a license available for the job when it does try to check out.

If the duration is too long, then there is a significant period of time that the job has a license checked out and LSF simultaneously reserves a license for the job. In effect, the job occupies two licenses (one checked out, and one reserved) even though it needs only one.

Use LSF License Scheduler Basic Edition to manage license resources

With LSF License Scheduler you do not need to write and manage an elim.

About this task

Jobs do not need a duration for the license in the rusage expression. LSF License Scheduler can track how a job checks out licenses, which helps jobs avoid overusing licenses and occupying double the required resources. License resource availability is automatically adjusted based on the real license availability, just like an elim.

LSF License Scheduler periodically gets license usage (including total licenses, available licenses, and licenses in use) from the license server. It tracks license use of individual jobs by matching license checkouts to jobs and figures out how many licenses are used outside of LSF jobs and how many licenses could be used by the LSF cluster, which is the sum of free licenses and licenses that running LSF jobs use. LSF periodically communicates with LSF License Scheduler to get the number of licenses that are available to the LSF cluster, then reserves licenses for running jobs according to their rusage expressions; the remaining free licenses are used to dispatch pending jobs. During intervals that LSF connects to LSF License Scheduler, the licenses that are released by done jobs can be used to dispatch pending jobs immediately.

In the following example, there are three units of a license feature. One unit is used outside of LSF jobs. One unit is used by an LSF job. The job submission and license checkout process is as follows:

  1. LSF License Scheduler queries license server to get three units of a license feature.
  2. LSF connects to LSF License Scheduler to find out that here are three units available.
  3. The user named user1 submits a job to LSF on host1, specifying one unit of the license in the rusage expression. After the job runs, LSF informs LSF License Scheduler of this running job.
  4. At the same time, outside of LSF, the user named user2 checks out one unit of the license on host2.
  5. LSF License Scheduler gets the latest license usage from the license server. It finds there are two license checkouts and one free unit. The user@host of one license checkout is consistent with the running job, and LSF License Scheduler determines that the job made this checkout. As result, one unit is considered to be used outside of LSF jobs, one unit is considered to be used by the LSF job and one unit is free. Therefore, there are 2 units available to the LSF cluster.
  6. LSF gets the newest licenses type resources from LSF License Scheduler. The new available value is 2. LSF then reserves 1 license for the running job. The 1 free unit of the license can be used by other jobs.
Job submission and license checkout process for LSF License Scheduler

When LSF License Scheduler determines which licenses are used by a job, it only considers whether license checkout has the same user and host pair with the job; if so, the licenses are deemed to be consumed by the job. When there are multiple licenses checkouts shared the same user@host, LSF License Scheduler cannot exactly determine the relationship between checkouts and jobs because it does not know which license checkout are associated with which job processes. This means that the blusers -J does not always accurately reflect whether a job has actually checked out licenses. However, this does not affect the accuracy of LSF job dispatch.

Procedure

  1. Install LSF License Scheduler Basic Edition within the LSF environment.

    For more information on how to install LSF License Scheduler, refer to Install License Scheduler.

  2. Configure LSF License Scheduler information (including the management host, license server, and cluster licenses) in the $LSF_ENVDIR/lsf.licensescheduler file. For example:

    After making this change, use the blstartup command to start LSF License Scheduler.

    Begin Parameters
    PORT = 9581
    HOSTS = LSManagementHost
    ADMIN = admin
    LM_STAT_INTERVAL=30
    LMSTAT_PATH = /usr/bin
    CLUSTER_MODE=y
    End Parameters
    
    Begin Clusters
    CLUSTERS
    myCluster
    End Clusters
    
    Begin ServiceDomain
    NAME = LanServer
    LIC_SERVERS = ((1770@licHost))
    End ServiceDomain
    
    Begin Feature
    NAME = lic3
    CLUSTER_DISTRIBUTION=LanServer(myCluster 1)
    End Feature

    The LSF License Scheduler management host is LSManagementHost. It manages license lic3, which is provided by license server 1770@licHost, and the cluster named myCluster can use it. LSF License Scheduler gets the license usage from 1770@licHost every 30 seconds.

    For more information on how to configure LSF License Scheduler Basic Edition, refer to Configure LSF License Scheduler Basic Edition.

  3. If you already started LSF before installing LSF License Scheduler, you must restart the mbatchd daemon on the LSF management host with the badmin mbdrestart command.

    After starting up, LSF registers itself to LSF License Scheduler, gets license type resources, and adds them to its resources table as dynamic numeric shared resources. Use the bhosts –s command to show these resources.

    admin@server1: bhosts -s
    RESOURCE                 TOTAL       RESERVED       LOCATION
    lic3                         2            0.0
                                                         server1
                                                         server2
  4. Submit a job by specifying lic3 in the job rusage expression.

    The job checks out the license from the license server by its actual name.

    bsub –R "rusage[lic3=1]" job_script
    ...
    admin@server1: bhosts -s
    RESOURCE                 TOTAL       RESERVED       LOCATION
    lic3                         1            1
                                                        server1
                                                        server2
  5. Use the LSF License Scheduler command blstat to check license usage details.
    admin@server1: blstat
    FEATURE: lic3@ myCluster
     SERVICE_DOMAIN: LanServer
     TOTAL_TOKENS: 2    TOTAL_ALLOC: 2    TOTAL_USE: 1    OTHERS: 0
     CLUSTER SHARE ALLOC TARGET INUSE RESERVE OVER PEAK BUFFER FREE  DEMAND
    myCluster  100.0%  2     2      1      0       0     1     0      1      0
  6. Use the LSF License Scheduler command blusers -J to check license usage of the job.
    admin@server1: blusers -J
    JOBID  USER   HOST    PROJECT   CLUSTER    START_TIME
    1      admin  server1 default   myCluster  Aug 27 02:49:11
    RESOURCE   RUSAGE  SERVICE_DOMAIN    INUSE    EFFECTIVE_PROJECT
    lic3       1       LanServer         1        default

What to do next

LSF License Scheduler Basic Edition limits license use of jobs to cluster mode features, which are distributed to a single cluster with one service domain per license feature. It is not intended to apply policies on how licenses are shared between clusters or projects. If you have more advanced license management requirements, you can upgrade to LSF License Scheduler Standard Edition, which has full LSF License Scheduler functionality, including support for all modes (cluster mode and project mode), sharing licenses across multiple clusters, sharing licenses between project teams and departments around the globe, scheduling licenses within extremely large clusters, license preemption across departments, feature groups, project groups, multiple service domains per license, and taskman jobs. For details about the features of LSF License Scheduler Standard Edition, see IBM Spectrum LSF License Scheduler.