Creating custom hardware specifications for deployment on dedicated GPUs

Some foundation model types and hardware configurations might require a custom hardware specification. Learn about the requirements for creating a custom hardware specification.

General considerations and limitations

  • For a list of standard hardware specifications that you can use to deploy your custom foundation models on dedicated GPUs, see Predefined hardware specifications.
  • You cannot use predefined hardware specifications with quantized models. For quantized models and in other non-standard cases, use a custom hardware specification.
  • Custom hardware specifications are not checked for compliance with the installed hardware. If you use a hardware specification with insufficient resources for your deployment, the deployment will fail with the following message:
    Failed to deploy the custom foundation model due to an internal error. The runtime failed to start due to 'insufficient resources'. Retry the operation. Contact IBM support if the problem persists.
    
  • When you're creating a custom hardware specification for your model, follow the Resource utilization guidelines for custom foundation models.

Creating custom hardware specifications in Projects

Use the following code sample to create a custom hardware specification for your model in a project:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?project_id=$project_id" \
-H "Content-Type:application/json" \
--data '{
  "name": "custom_hw_spec",
  "description": "Custom hardware specification for foundation models",
  "nodes": {
    "cpu": {
      "units": "2"
    },
    "mem": {
      "size": "128Gi"
    },
    "gpu": {
      "num_gpu": 1
    }
  }
}'

Creating custom hardware specification in deployment spaces

Use the following code sample to create a custom hardware specification for your model in a deployment space:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<cluster_url>/v2/hardware_specifications?space_id=$space_id"
-H "Content-Type:application/json"
--data '{
  "name": "custom_hw_spec",
  "description": "Custom hardware specification for foundation models",
  "nodes": {
    "cpu": {
      "units": "2"
    },
    "mem": {
      "size": "128Gi"
    },
    "gpu": {
      "num_gpu": 1
    }
  }
}'

Creating custom hardware specifications for selected GPU nodes

To create a custom hardware specification for specific GPU nodes, you can use the node_selector field and specify the label_name and label_value of the GPU node that you want to use. You can also create custom node labels for the GPU node and provide the node label name and value in the node_selector field when you create the custom hardware specification.

For example, if you have two NVIDIA A100 GPUs available in your cluster with the labels nvidia.com/gpu.product=NVIDIA-A100-SXM4-80GB and nvidia.com/gpu.product=NVIDIA-A100-80GB-PCIe, and you do not provide values for node_selector when you create the custom hardware specification, watsonx.ai chooses one of the available GPUs automatically to deploy your custom foundation model. If you want to deploy your custom foundation model on a specific GPU node, you must specify the label_name and label_value in the node_selector field when you create the custom hardware specification which is used for deploying the model.

The following code sample shows how to create a custom hardware specification for the node selector nvidia.com/gpu.product=NVIDIA-A100-SXM4-80GB by adding the label_name and label_value in the node_selector field:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "<cluster url>/v2/hardware_specifications?      project_id=$project_id" \
   -H "Content-Type:application/json" \
   --data '{
      "name": "custom_hw_spec",
      "description": "Custom hardware specification for foundation models",
      "nodes": {
         "cpu": {
             "units": "2"
         },
         "mem": {
             "size": "128Gi"
         },
         "gpu": {
             "num_gpu": 1
         },
         "node_selector": [
             {
                 "label_name": "nvidia.com/gpu.product",
                 "label_value": "NVIDIA-A100-SXM4-80GB"
             }
         ]
      }
}'

As the result, the new hardware specification appears in the Select a hardware specification dropdown menu when you deploy the model in the UI.