Global job IDs for forwarding and forwarded clusters using LSF multicluster capability

Global job IDs allow an LSF multicluster environment to use the same job IDs between the forwarding and forwarded clusters, keeping the IDs uniform. These global job IDs are unique. To guarantee unique job IDs, starting in Fix Pack 14, LSF introduces indexes for clusters, so that each job submitted from the cluster includes an index to the ending digits of the job ID (for example, job ID 100 with an index value of 22 will have a global job ID of 10022).

About this task

To configure global job IDs for your LSF multicluster environment, add an index column to the lsf.shared configuration file, then as a best practice, increase both the MAX_JOBID value in the lsb.params file (to 99999999) and the LSB_JOBID_DISP_LENGTH value in the lsf.conf file (to 8).

Procedure

  1. Log on to any host in the cluster as the LSF administrator.
  2. Edit the lsf.shared configuration file:
    1. Add new column called Index to the Cluster section and assign indexes for each cluster. The index can be a number from 1-99, and must be unique to the other clusters within the same cluster group.
      For example:
      Begin Cluster
      ClusterName  Servers        Index 
      cluster1     (hostA hostB)  1
      cluster2     (hostD)        2
      End Cluster
      
      Tip: Typically, the lsf.shared file is centrally located so than any changes can be accessed by all clusters; however, if you do not have a shared system, then ensure that you update the lsf.shared file on each cluster.

      With the Index column set, the cluster reading the configuration generates global job IDs ending with the index. Also, when jobs are forwarded to the cluster, the cluster will try to match the job ID with the index configured. If the index does not match the job ID, the job will be rejected from the cluster. If the index matches the job ID, the job will then be accepted as the same job ID given from the forwarding cluster. If the job ID has already been used, the job ID will be changed to end in 00.

      Starting in Fix Pack 15, for hybrid global job IDs and non-global job IDs, you can specify a hyphen (-) (instead of a number between 1-99), to be configured in the Index column. In this case, also ensure that the MC_STRICT_JOBID_CHECKING parameter is set to N in the lsb.params file. By Default MC_STRICT_JOBID_CHECKING is set to N. Forwarded jobs not matching the cluster index set in the lsf.shared file, will by default be allowed to run on the execution host. Typically this parameter is left as N; however you can set it to N to enable the previous behavior.

    2. Save the changes to the lsf.shared file.
    3. Run lsadmin reconfig to reconfigure LIM.
    4. Run badmin reconfig to reconfigure the mbatchd daemon.
  3. To maximize the number of available global job IDs that LSF can assign, as a best practice, increase both the MAX_JOBID value in the lsb.params file (to 99999999) and the LSB_JOBID_DISP_LENGTH value in the lsf.conf file (to 8).
  4. To view the new index column, run the lscluster cluster_name command to see all configuration information for your cluster (including the new column).
    Here is example output from running lscluster hostA:
    
    CLUSTER_NAME   STATUS   MASTER_HOST           ADMIN    HOSTS  SERVERS INDEX
    cluster1       ok       hostA                 jsmith   1      1       1