Installing LSF Connector for Kubernetes

Verify the prerequisites, install LSF Connector for Kubernetes and deploy the jobs.

Before you begin

Note:
  • LSF Connector for Kubernetes supports Kubernetes 1.20.15 or earlier.
  • LSF Connector for Kubernetes only supports NVidia GPUs. LSF Connector for Kubernetes does not support other types of GPUs.
  • The container runtime must have the NVIDIA_VISIBLE_DEVICES environment variable set.
  • To avoid scheduling conflicts, only use one scheduler to schedule GPUs. When using the LSF Connector for Kubernetes, specify LSF as the scheduler and not the Kubernetes scheduler.
  • By default, any user that is allowed to submit pods to Kubernetes will be able to impersonate any user. This is an issue with Kubernetes and not with LSF.
  1. Install IBM Spectrum LSF, LSF Suite for Enterprise, or LSF Suite for HPC.
  2. Install Kubernetes on a subset of machines in the LSF cluster. If you have LSF Application Center, do not use the LSF Application Center host as the Kubernetes management host.
  3. Back up the <LSF_TOP>/10.1 directory from the LSF management host.
  4. Configure the GPUs in your cluster to work with Kubernetes.

    For more details, refer to the following links:

About this task

LSF is installed on all machines in the cluster. Kubernetes is installed on a subset of machines in the LSF cluster, where the LSF management host and the Kubernetes management host are both kept on separate machines. This configuration allows users to run both batch and Kubernetes workloads on the same infrastructure.

An example of a cluster that runs LSF Connector for Kubernetes.

Procedure

  1. Create the Custom Resource Definition (CRD) for parallel jobs by running the Kubernetes kubectl create command.

    kubectl create -f $LSF_BINDIR/../../misc/kubernetes/parallelJob-v1alpha1.yaml

    Note: You must be the cluster administrator to run this command.
  2. Edit the lsf.conf file and set the following parameters:
    LSB_KUBE_ENABLE=Y
    LSF_ENABLE_EXTSCHEDULER=Y
    LSB_KUBE_CONFIG=<path_to_LSF_ENVDIR>/kubernetes.config
    LSB_MAX_PACK_JOBS=500
  3. Edit the lsb.modules file in the $LSF_ENVDIR/lsbatch/<cluster_name>/configdir directory and uncomment the line that specifies the schmod_kubernetes module, which is near the end of the PluginModule list.
    Begin PluginModule
    ...
    schmod_affinity            ()                   ()
    #schmod_demand             ()                   ()
    schmod_kubernetes          ()                   ()
    End PluginModule
  4. Edit the lsb.resources file and configure per-task allocation for GPUs.
    cat >> $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.resources << END
    
    Begin ReservationUsage
    RESOURCE             METHOD        RESERVE
    ngpus_physical       PER_TASK      N
    End ReservationUsage
    END
  5. Edit the lsb.params file and set the following parameters:
    RELAX_JOB_DISPATCH_ORDER=N
    MAX_USER_PRIORITY=1000
  6. Enable LSF user impersonation in the lsf.sudoers file.

    Enabling LSF user impersonation means that jobs that are submitted through Kubernetes causes LSF to create a control job on behalf of the user. To enable user impersonation, run the following commands:

    echo 'LSB_IMPERSONATION_USERS="lsfadmin"' > /etc/lsf.sudoers
    chown root /etc/lsf.sudoers
    chmod 500 /etc/lsf.sudoers
  7. Edit the lsf.shared file and add kubernetes to the list of resources in the Resource section.
    Begin Resource
    RESOURCENAME  TYPE    INTERVAL INCREASING CONSUMABLE DESCRIPTION
    ...
    kubernetes    Boolean ()       ()         (Kubernetes node)
    End Resource
  8. Edit the lsf.cluster.cluster_name file and use the kubernetes resource keyword to identify the Kubernetes hosts to LSF.

    In the following configuration identifies all three LSF hosts as Kubernetes hosts.

    Begin   Host
    HOSTNAME    model    type    server  RESOURCES
    lsfmanager  !        !       1       (kubernetes mg)
    lsfcompute1 !        !       1       (kubernetes)
    lsfcompute2 !        !       1       (kubernetes)
    End     Host
  9. Configure the Kubernetes application profile.

    LSF Connector for Kubernetes includes an example Kubernetes template file (example-template.yaml) file. Copy this file to your cluster as kube-template.yaml to use with the Kubernetes application profile.

    Ensure that you created a namespace called lsf, and that docker.io/centos is registered to the lsf namespace as an Image policy.

    1. Copy the example Kubernetes template file to your cluster.
      • IBM Spectrum LSF Suite: The example-template.yaml file is in the /opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes directory.
      • IBM Spectrum LSF: The example-template.yaml file is in the $LSF_BINDIR/../../misc/kubernetes directory.

      For example, copy the example-template.yaml file to the /share/lsf/conf/lsbatch/cluster0/configdir/ directory:

      • IBM Spectrum LSF Suite:
        $ cp /opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes/example-template.yaml /share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml
        The example-template.yaml file is in the /opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes directory.
      • IBM Spectrum LSF:
        $ cp $LSF_BINDIR/../../misc/kubernetes/example-template.yaml /share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml
    2. Edit the lsb.applications file and configure the Kubernetes application profile.

      Uncomment the kube application profile and edit the relevant parameters. Change the file path in the CONTAINER parameter so that it specifies the file path to the Kubernetes template file (kube-template.yaml) file.

      Begin Application
      NAME = kube
      DESCRIPTION = K8S job container
      CONTAINER = kubernetes[template(/share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml)]
      End Application
  10. Restart the LSF daemons.
    • If you are using one of the LSF Suites, use the following command to restart all LSF daemons:
      systemctl restart lsfd
    • If you are using LSF, restart the LSF daemons manually by using the following commands:
      bctrld start lim
      bctrld start res
      bctrld start sbd