Creating customized hardware specifications
You might need to create a customized hardware specification if your model requires a specific configuration which is not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness.
Supported hardware specifications
For a list of supported hardware specifications that you can use to deploy a PEFT model, see Supported hardware specifications.
Calculating GPU memory for LoRA and QLoRA models
You can create a customized GPU hardware specification by calculating the required GPU memory based on the number of billion parameters, quantization, and LoRA adapter size. The formula for calculating GPU memory for LoRA (non-quantized) and QLoRA (quantized) models is as follows:
Non-quantized models (LoRA)
To create a custom hardware specification for non-quantized models to deploy LoRA adapters, the required hardware resources can be calculated as follows:
Resource | Calculation |
---|---|
GPU memory | ( (Number of Billion parameters * 2) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memory |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Number of GPUs + 1 |
CPU memory | Equal to GPU memory |
Example
Suppose we have a non-quantized model with 10 billion parameters, 2 LoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:
( ( 10 * 2) + (2 * 0.125) ) + 50% additional memory = 30.375 ~= 31 GB
This formula for calculating the GPU memory takes into account the number of billion parameters in the model, adds 50% additional memory for overhead, and includes the memory required for LoRA adapters.
Quantized models (QLoRA)
To create a custom hardware specification for quantized models to deploy QLoRA adapters, the required hardware resources can be calculated.
For 4-bit quantized models:
Resource | Calculation |
---|---|
GPU memory | ( (Number of Billion parameters * 0.5) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memor |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Number of GPUs + 1 |
CPU memory | Equal to GPU memory |
Example:
Suppose we have a quantized model with 10 billion parameters, 2 QLoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:
( (10 * 0.5) + (2 * 0.125) ) + 50% of additional memory = 7.875 GB ~= 8GB
This formula is similar to the non-quantized model formula, but the number of billion parameters is multiplied by 0.5, reflecting the reduced memory requirements of quantized models.
For 8-bit quantized models:
Resource | Calculation |
---|---|
GPU memory | ( (Number of Billion parameters) + (Number of Lora adapters * Size of Lora adapter in GB) ) + 50 % additional memory |
Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
Number of CPUs | Number of GPUs + 1 |
CPU memory | Equal to GPU memory |
Example:
Suppose we have a quantized model with 10 billion parameters, 2 QLoRA adapters, and an adapter size of 128 MB. The required GPU memory would be:
( (10) + (2 * 0.128) ) + 50% of additional memory = 15.375 GB ~= 16GB
This formula is similar to the non-quantized model formula, but the number of billion parameters is multiplied by 0.5, reflecting the reduced memory requirements of quantized models.
Creating customized hardware specification
To create a customized hardware specification for deploying a PEFT model that uses specialized hardware resources, you must follow a programmatic approach.
The following code sample shows how to create a custom hardware specification with REST API:
curl -ik -X POST -H "Authorization: Bearer $TOKEN" "https://<HOST>/v2/hardware_specifications?space_id=$space_id" \
-H "Content-Type:application/json" \
--data '{
"name": "custom_hw_spec",
"description": "Custom hardware specification for foundation models",
"nodes": {
"cpu": {
"units": "2"
},
"mem": {
"size": "128Gi"
},
"gpu": {
"num_gpu": 1
}
}
}'
Learn more
Parent topic: Requirements for deploying PEFT models