Requirements for deploying custom foundation models on MIG-enabled clusters
Review the considerations and requirements for deploying a custom foundation model on an MIG-enabled cluster.
You can deploy custom foundation models on an MIG-enabled cluster in both lightweight or full service watsonx.ai™ installation modes. As you prepare to deploy a custom foundation model, review these requirements:
- Consider the type of model that you are deploying. Tasks differ slightly depending on whether you are downloading a model from a public repository like Hugging Face or a model located in your environment. For each deployment task, follow the steps for your scenario.
- Review the Role requirements for the tasks that are associated with deploying a custom foundation model.
- Review whether or not your model requires a custom hardware specification. See Hardware requirements.
- Configure MIG support to deploy custom foundation models. See Configuring MIG support to deploy custom foundation models.
Hardware requirements
The standard supported hardware configurations to deploy custom foundation models on MIG-enabled
clusters are as follows:
- NVIDIA A100 GPUs with 80 GB RAM
- NVIDIA H100 GPUs with 80 GB RAM
- NVIDIA H200 GPUs with 141 GB RAM
Restriction: You cannot use NVIDIA L40S GPUs with 48 GB
RAM to deploy custom foundation models on
MIG-enabled clusters.
Configuring MIG support to deploy custom foundation models
The cluster administrator must perform the following tasks to deploy custom foundation models on
MIG-enabled clusters:
- Enable MIG partitioning on the required GPU nodes at the cluster level. To learn more about MIG
partitioning, see NVIDIA documentation for configuring MIG Support in OpenShift Container
Platform.
1g.10gb: nvidia.com/mig-1g.10gb 2g.20gb: nvidia.com/mig-2g.20gb 3g.40gb: nvidia.com/mig-3g.40gb 7g.80gb: nvidia.com/mig-7g.80gb - Validate and add support for the NVIDIA MIG single strategy. With single strategy, you can use fixed partition size on a single GPU. For more information, see Configuring single strategy for MIG support
Configuring single strategy for MIG support
Follow these steps to configure single-strategy for MIG support:
- Set the MIG advertisement strategy to single.
Specify the host name, strategy, and configuration label in environment variables.
NODE_NAME=myworker.redhat.com STRATEGY=single MIG_CONFIGURATION=all-3g.40gb - Apply the desired MIG partitioning profile.
For example, label a node to create two
3g.20gbinstances on each GPU with the following command:oc label node/${NODE_NAME} nvidia.com/mig.config=${MIG_CONFIGURATION} --overwrite - Verify the MIG configuration:
- Confirm that the correct label is applied to the
node:
oc get node/${NODE_NAME} -o json | jq '.metadata.labels' - Check that the configuration was applied
successfully:
nvidia-smi -L
- Confirm that the correct label is applied to the
node:
To learn more about configuring single strategy for MIG support, see Example of configuring single strategy for MIG in NVIDIA documentation.