Deploying Kubernetes jobs in LSF Connector for Kubernetes

LSF Connector for Kubernetes enables new options for jobs that are run through Kubernetes. These options are needed for running High Performance Computing (HPC) applications and large AI jobs.

The options extend the pod specification with annotations to define which scheduling and placement policies to use. To enable these features, the pod specification must have the schedulerName parameter set to lsf, for example:

spec.template.spec.schedulerName: lsf

The new features for Kubernetes jobs that this provides includes the following:

  • Job priority
  • Application profiles
  • Fair sharing of resources
  • Parallel jobs
  • GPU management

The following table shows the new annotations along with the equivalent LSF job submission option.

Pod spec field Description LSF job submission option
spec.template.metadata.name A name to assign to the job. Job name (-J)
spec.template.metadata.annotations.lsf.ibm.com/dependency A job that must complete before this is dispatched.
spec.template.metadata.annotations.lsf.ibm.com/project A project name to assign to job. Project name (-P)
spec.template.metadata.annotations.lsf.ibm.com/application An application profile to use. Application profile (-app)
spec.template.metadata.annotations.lsf.ibm.com/gpu The GPU requirements for the job. GPU requirement (-gpu)
spec.template.metadata.annotations.lsf.ibm.com/queue The name of the job queue in which to run the job. Queue (-q)
spec.template.metadata.annotations.lsf.ibm.com/jobGroup A job group to assign to job. Job Group (-g)
spec.template.metadata.annotations.lsf.ibm.com/fairshareGroup The fair share group to which to assign the job. Fair share group (-G)
spec.template.metadata.annotations.lsf.ibm.com/user The user to run the application as, and for accounting. Job submission user
spec.template.metadata.annotations.lsf.ibm.com/serviceClass The service class to apply to the job. Service class (-sla)
spec.template.metadata.annotations.lsf.ibm.com/reservation The resources to reserve prior to running the job. Advanced reservation (-U)
spec.template.spec.containers[].resources.

requests.memory

The amount of memory to reserve for the job. Memory reservation (-R "rusage[mem=...]")
spec.template.spec.schedulerName Set to lsf. N/A

For information on the annotations and their meanings refer IBM Spectrum LSF cluster management essentials.

These capabilities are accessed by modifying the pod specifications for jobs. The following specification is a minimal example:
apiVersion: batch/v1
kind: Job
metadata:
  name: myjob-001
spec:
  template:
    metadata:
      name: myjob-001
    spec:
      schedulerName: lsf     # This directs scheduling to the LSF scheduler
      containers:
      - name: ubuntutest
        image: ubuntu
        command: ["sleep", "60"]
        resources:
          requests:
            memory: 5Gi
      restartPolicy: Never

This example enables Kubernetes to use lsf as the job scheduler. The LSF job scheduler can then apply its policies to choose when and where the job will run.

Additional parameters can be added to the pod YAML file to control the job. The following example shows how to use the additional annotations:
apiVersion: batch/v1
kind: Job
metadata:
  name: myjob-001
spec:
  template:
    metadata:
      name: myjob-001
      # The following annotations provide additional scheduling
      # information to better place the pods on the worker nodes
      # NOTE: Some annotations require additional LSF configuration annotations:
      annotations:
        lsf.ibm.com/project: "big-project-1000"
        lsf.ibm.com/queue: "normal"
        lsf.ibm.com/jobGroup: "/my-group"
        lsf.ibm.com/fairshareGroup: "gold"
    spec:
      schedulerName: lsf     # This directs scheduling to the LSF scheduler
      containers:
      - name: ubuntutest
        image: ubuntu
        command: ["sleep", "60"]
      restartPolicy: Never

The annotations provide the LSF scheduler with more information about the job and how it should be run.

Users that submit a job through Kubernetes typically are trusted to run services and workloads as other users. For example, the following pod specifications allow the pod to run as other users:
apiVersion: batch/v1
kind: Job
metadata:
  name: myjob-uid1003-0002
spec:
  template:
    metadata:
      name: myjob-uid1003-0002
    spec:
      schedulerName: lsf
      containers:
      - name: ubuntutest
        image: ubuntu
        command: ["id"]
      restartPolicy: Never
      securityContext:
        runAsUser: 1003
        fsGroup: 100
        runAsGroup: 1001

The pod is run as UID 1003 and produces the following output:

uid=1003(billy) gid=0(root) groups=0(root),1001(users)

Note the GID and groups, and ensure that you limit who can create pods. Alternatively, you can use LSF applications to allow the administrator to predefine the pod specification file.

For further information and examples, refer to https://github.com/IBMSpectrumComputing/lsf-kubernetes.