Verify the prerequisites, install LSF Connector for Kubernetes and deploy
the jobs.
Before you begin
Note:
- LSF Connector for Kubernetes
supports Kubernetes 1.20.15 or earlier.
- LSF Connector for Kubernetes only
supports NVidia GPUs. LSF Connector for Kubernetes does
not support other types of GPUs.
- The container runtime must have the NVIDIA_VISIBLE_DEVICES environment
variable set.
- To avoid scheduling conflicts, only use one scheduler to schedule GPUs. When using the LSF Connector for Kubernetes,
specify LSF as the
scheduler and not the Kubernetes scheduler.
- By default, any user that is allowed to submit pods to Kubernetes will be able to impersonate
any user. This is an issue with Kubernetes and not with LSF.
- Install IBM
Spectrum LSF,
LSF Suite for Enterprise,
or LSF Suite for HPC.
- Install Kubernetes on a subset of machines in the LSF
cluster. If you have LSF Application Center, do
not use the LSF Application Center host
as the Kubernetes management host.
- Back up the <LSF_TOP>/10.1 directory from the
LSF management host.
- Configure the GPUs in your cluster to work with Kubernetes.
For more details, refer to the
following links:
About this task
LSF is
installed on all machines in the cluster. Kubernetes is installed on a subset of machines in the
LSF cluster, where the LSF management
host and the Kubernetes management host are both kept on separate machines. This configuration allows
users to run both batch and Kubernetes workloads on the same infrastructure.
Procedure
-
Create the Custom Resource Definition (CRD) for parallel jobs by running the Kubernetes
kubectl create command.
kubectl create -f
$LSF_BINDIR/../../misc/kubernetes/parallelJob-v1alpha1.yaml
Note: You must be the cluster administrator to run this command.
-
Edit the lsf.conf file and set the following parameters:
LSB_KUBE_ENABLE=Y
LSF_ENABLE_EXTSCHEDULER=Y
LSB_KUBE_CONFIG=<path_to_LSF_ENVDIR>/kubernetes.config
LSB_MAX_PACK_JOBS=500
-
Edit the lsb.modules file in the
$LSF_ENVDIR/lsbatch/<cluster_name>/configdir directory
and uncomment the line that specifies the schmod_kubernetes module, which is
near the end of the PluginModule list.
Begin PluginModule
...
schmod_affinity () ()
#schmod_demand () ()
schmod_kubernetes () ()
End PluginModule
-
Edit the lsb.resources file and configure per-task allocation for
GPUs.
cat >> $LSF_ENVDIR/lsbatch/<cluster_name>/configdir/lsb.resources << END
Begin ReservationUsage
RESOURCE METHOD RESERVE
ngpus_physical PER_TASK N
End ReservationUsage
END
-
Edit the lsb.params file and set the following parameters:
RELAX_JOB_DISPATCH_ORDER=N
MAX_USER_PRIORITY=1000
-
Enable LSF user
impersonation in the lsf.sudoers file.
Enabling LSF user
impersonation means that jobs that are submitted through Kubernetes causes LSF to
create a control job on behalf of the user. To enable user impersonation, run the following
commands:
echo 'LSB_IMPERSONATION_USERS="lsfadmin"' > /etc/lsf.sudoers
chown root /etc/lsf.sudoers
chmod 500 /etc/lsf.sudoers
-
Edit the lsf.shared file and add kubernetes to the
list of resources in the Resource section.
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING CONSUMABLE DESCRIPTION
...
kubernetes Boolean () () (Kubernetes node)
End Resource
-
Edit the lsf.cluster.cluster_name file and use the
kubernetes resource keyword to identify the Kubernetes hosts to LSF.
In the following configuration identifies all three LSF hosts
as Kubernetes hosts.
Begin Host
HOSTNAME model type server RESOURCES
lsfmanager ! ! 1 (kubernetes mg)
lsfcompute1 ! ! 1 (kubernetes)
lsfcompute2 ! ! 1 (kubernetes)
End Host
-
Configure the Kubernetes application profile.
LSF Connector for Kubernetes
includes an example Kubernetes template file (example-template.yaml) file. Copy
this file to your cluster as kube-template.yaml to use with the Kubernetes
application profile.
Ensure that you created a namespace called lsf, and that
docker.io/centos is registered to the lsf namespace as
an Image policy.
-
Copy the example Kubernetes template file to your cluster.
- IBM Spectrum
LSF Suite: The
example-template.yaml file is in the
/opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes directory.
- IBM Spectrum
LSF: The example-template.yaml file is in the
$LSF_BINDIR/../../misc/kubernetes directory.
For example, copy the example-template.yaml file to the
/share/lsf/conf/lsbatch/cluster0/configdir/ directory:
- IBM Spectrum
LSF Suite:
$ cp /opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes/example-template.yaml /share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml
The
example-template.yaml file is in the
/opt/ibm/lsfsuite/lsf/10.1/misc/kubernetes directory.
- IBM Spectrum
LSF:
$ cp $LSF_BINDIR/../../misc/kubernetes/example-template.yaml /share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml
-
Edit the lsb.applications file and configure the Kubernetes application
profile.
Uncomment the kube application profile and edit the relevant parameters.
Change the file path in the CONTAINER parameter so that it specifies the file
path to the Kubernetes template file (kube-template.yaml) file.
Begin Application
NAME = kube
DESCRIPTION = K8S job container
CONTAINER = kubernetes[template(/share/lsf/conf/lsbatch/cluster0/configdir/kube-template.yaml)]
End Application
-
Restart the LSF
daemons.