Resource utilization guidelines for custom foundation models
Depending on your model version, you might have to assign specific resources, such as memory or the number of GPUs, to be able to successfully deploy it.
Resource utlization formulas
Follow these formulas when you assign resources to deploy your custom foundation model.
Important:
Failure to follow these formulas might result in an unexpected model behavior.
Non-quantized models:
| Resource | Calculation |
|---|---|
| GPU Memory | (Number of Billion parameters * 2) + 50 % additional memory |
| Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
| Number of CPUs | Number of GPUs + 1 |
| CPU memory | Equal to GPU memory |
Quantized models:
4-bit quantized models:
| Resource | Calculation |
|---|---|
| GPU Memory | (Number of Billion parameters * 0.5) + 50 % additional memory |
| Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
| Number of CPUs | Number of GPUs + 1 |
| CPU memory | Equal to GPU memory |
8-bit quantized models:
| Resource | Calculation |
|---|---|
| GPU Memory | Number of Billion parameters + 50 % additional memory |
| Number of GPUs | Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB |
| Number of CPUs | Num of GPUs + 1 |
| CPU memory | Equal to GPU memory |