Resource utilization guidelines for custom foundation models

Depending on your model version, you might have to assign specific resources, such as memory or the number of GPUs, to be able to successfully deploy it.

Resource utlization formulas

Follow these formulas when you assign resources to deploy your custom foundation model.

Important:

Failure to follow these formulas might result in an unexpected model behavior.

Non-quantized models:

Guidelines for custom hardware specifications: non-quantized models
Resource Calculation
GPU Memory (Number of Billion parameters * 2) + 50 % additional memory
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Number of GPUs + 1
CPU memory Equal to GPU memory

Quantized models:

4-bit quantized models:

Guidelines for custom hardware specifications: 4-bit quantized models
Resource Calculation
GPU Memory (Number of Billion parameters * 0.5) + 50 % additional memory
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Number of GPUs + 1
CPU memory Equal to GPU memory

8-bit quantized models:

Guidelines for custom hardware specifications: 8-bit quantized models
Resource Calculation
GPU Memory Number of Billion parameters + 50 % additional memory
Number of GPUs Number of GPUs depends on GPU memory requirements: 1GPU = 80 GB
Number of CPUs Num of GPUs + 1
CPU memory Equal to GPU memory