Requirements for deploying PEFT models

Review supported model architectures, software requirements, and hardware requirements for deploying fine-tuned models that are trained with PEFT techniques.

Supported model architectures

Models that are trained with supported architectures can be deployed by using watsonx.ai.

The following base models are supported for fine tuning with LoRA and QLoRA techniques and can be deployed with watsonx.ai:

Supported model architectures for PEFT techniques
Model architecture Model PEFT Technique
Granite ibm/granite-3-1-8b-base LoRA
Llama meta-llama/llama-3-1-8b
meta-llama/llama-3-1-70b
LoRA
Llama meta-llama/llama-3-1-70b-gptq QLoRA

Software requirements

You can use the watsonx-cfm-caikit-1.1 software specification, which is based on the vLLM runtime engine, to deploy your fine-tuned model that is trained with a PEFT technique.

Hardware requirements

Although PEFT uses less memory compared to instruction fine-tuning, it is still a resource-intensive process that requires you to have GPU resources available for deployment.

The standard supported hardware configurations to deploy base foundation models are:

  • NVIDIA A100 80 GB of GPU memory
  • NVIDIA H100 with 80 GB of GPU memory
  • NVIDIA L40S with 48 GB of GPU memory

Supported hardware specifications

When deploying base foundation models with PEFT models(LoRA or QLoRA adapters), you must select a hardware specification that aligns with the parameter count of the base model and the number of adapters to be used.

Based on the number of parameters used in the base foundation model and the number of adapters to be used, choose a hardware specification to deploy the base foundation model.

You can use the following predefined hardware specifications for deployment:

Predefined hardware configurations
Parameters Range Hardware specification Memory available
1B to 20B WX-S 1 GPU, 2 CPU and 60 GB
21B to 40B WX-M 2 GPU, 3 CPU and 120 GB
41B to 80B WX-L 4 GPU, 5 CPU and 240 GB
81B to 200B WX-XL 8 GPU, 9 CPU and 600 GB

Creating customized hardware specifications

You may need to create a custom hardware specification if your model requires a specific configuration not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness. For more information, see Creating custom hardware specifications.

Supported deployment types

You can create an online deployment for PEFT models. Online deployment allows for real-time inferencing and is suitable for applications that require low-latency predictions.

Batch deployments are not currently supported for deploying PEFT models.

Learn more

Parent topic: Deploying PEFT models