Creating customized hardware specifications

You might need to create a customized hardware specification if your model requires a specific configuration which is not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness.

Supported hardware specifications

For a list of supported hardware specifications that you can use to deploy a PEFT model, see Supported hardware specifications.

Calculating GPU memory for LoRA and QLoRA models

You can create a customized GPU hardware specification by calculating the required GPU memory based on the number of billion parameters, quantization, and LoRA adapter size. The formula for calculating GPU memory for LoRA (non-quantized) and QLoRA (quantized) models is as follows:

Non-quantized models (LoRA)

To create a custom hardware specification for non-quantized models to deploy LoRA adapters, the required hardware resources can be calculated as follows:

Calculating resources required for non-quantized models
Resource Calculation
GPU memory ( (Number of Billion parameters * 2) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memory
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Number of GPUs + 1
CPU memory Equal to GPU memory

Example

Suppose we have a non-quantized model with 10 billion parameters, 2 LoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:

( ( 10 * 2) + (2 * 0.125) ) + 50% additional memory = 30.375 ~= 31 GB

This formula for calculating the GPU memory takes into account the number of billion parameters in the model, adds 50% additional memory for overhead, and includes the memory required for LoRA adapters.

Quantized models (QLoRA)

To create a custom hardware specification for quantized models to deploy QLoRA adapters, the required hardware resources can be calculated.

For 4-bit quantized models:

Calculating resources required for quantized models
Resource Calculation
GPU memory ( (Number of Billion parameters * 0.5) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memor
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Number of GPUs + 1
CPU memory Equal to GPU memory

Example:

Suppose we have a quantized model with 10 billion parameters, 2 QLoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:

( (10 * 0.5) + (2 * 0.125) ) + 50% of additional memory = 7.875 GB ~= 8GB

This formula is similar to the non-quantized model formula, but the number of billion parameters is multiplied by 0.5, reflecting the reduced memory requirements of quantized models.

For 8-bit quantized models:

Calculating resources required for quantized models
Resource Calculation
GPU memory ( (Number of Billion parameters) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memory
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Number of GPUs + 1
CPU memory Equal to GPU memory

Example:

Suppose we have a quantized model with 10 billion parameters, 2 QLoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:

( (10) + (2 * 0.128) ) + 50% of additional memory = 15.375 GB ~= 16GB

This formula is similar to the non-quantized model formula, but the number of billion parameters is multiplied by 0.5, reflecting the reduced memory requirements of quantized models.

Creating customized hardware specification

To create a customized hardware specification for deploying a PEFT model that uses specialized hardware resources, you must follow a programmatic approach.

The following code sample shows how to create a custom hardware specification with REST API:

curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<HOST>/v2/hardware_specifications?space_id=$space_id" \
-H "Content-Type:application/json" \
--data '{
  "name": "custom_hw_spec",
  "description": "Custom hardware specification for foundation models",
  "nodes": {
    "cpu": {
      "units": "2"
    },
    "mem": {
      "size": "128Gi"
    },
    "gpu": {
      "num_gpu": 1
    }
  }
}'

Learn more

Parent topic: Requirements for deploying PEFT models