Requirements for deploying PEFT models
Review supported model architectures, software requirements, and hardware requirements for deploying fine-tuned models that are trained with PEFT techniques.
Supported model architectures
Models that are trained with supported architectures can be deployed by using watsonx.ai.
The following base models are supported for fine tuning with LoRA and QLoRA techniques and can be deployed with watsonx.ai:
Model architecture | Model | PEFT Technique |
---|---|---|
Granite | ibm/granite-3-1-8b-base |
LoRA |
Llama | meta-llama/llama-3-1-8b meta-llama/llama-3-1-70b |
LoRA |
Llama | meta-llama/llama-3-1-70b-gptq |
QLoRA |
Software requirements
You can use the watsonx-cfm-caikit-1.1
software specification, which is based on the vLLM
runtime engine, to deploy your fine-tuned model that is trained with a PEFT technique.
Hardware requirements
Although PEFT uses less memory compared to instruction fine-tuning, it is still a resource-intensive process that requires you to have GPU resources available for deployment.
The standard supported hardware configurations to deploy base foundation models are:
- NVIDIA A100 80 GB of GPU memory
- NVIDIA H100 with 80 GB of GPU memory
- NVIDIA L40S with 48 GB of GPU memory
Supported hardware specifications
When deploying base foundation models with PEFT models(LoRA or QLoRA adapters), you must select a hardware specification that aligns with the parameter count of the base model and the number of adapters to be used.
Based on the number of parameters used in the base foundation model and the number of adapters to be used, choose a hardware specification to deploy the base foundation model.
You can use the following predefined hardware specifications for deployment:
Parameters Range | Hardware specification | Memory available |
---|---|---|
1B to 20B | WX-S | 1 GPU, 2 CPU and 60 GB |
21B to 40B | WX-M | 2 GPU, 3 CPU and 120 GB |
41B to 80B | WX-L | 4 GPU, 5 CPU and 240 GB |
81B to 200B | WX-XL | 8 GPU, 9 CPU and 600 GB |
Creating customized hardware specifications
You may need to create a custom hardware specification if your model requires a specific configuration not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness. For more information, see Creating custom hardware specifications.
Supported deployment types
You can create an online deployment for PEFT models. Online deployment allows for real-time inferencing and is suitable for applications that require low-latency predictions.
Batch deployments are not currently supported for deploying PEFT models.
Learn more
Parent topic: Deploying PEFT models