Requirements for deployment

Review supported model architectures, software requirements, and hardware requirements for deploying custom foundation models that are fine-tuned with Parameter-Efficient Fine-Tuning (PEFT) techniques.

Supported model architectures

You can deploy custom foundation models that are fine-tuned with PEFT techniques for the following model architectures:

Supported model architectures and fine-tuning techniques
Model architecture Model name and size LoRA QLoRA
GraniteForCausalLM Granite PowerLM 3B
Granite 3.1 1B
Granite 3.1 2B
Granite 3.1 3B
Granite 3.1 8B
Granite 3.0 2B
Granite 3.0 8B
Yes Yes
GraniteMoeForCausalLM GraniteMoE 1B
GraniteMoE 3B
Yes N/A
LlamawithCausalLM Granite 3B
Granite 8B
Yes Yes
GPTBigCodeForCausalLM Granite 13B
Granite 20B
Yes Yes
GPTBigCodeForCausalLM Granite 34B Yes Yes
Llama 3.1 Llama3.1-8B Yes Yes
Llama 3.1 Llama3.1-70B Yes Yes
Llama 3.1 Llama3.1-405B No Yes
Llama 3 Llama3-8B Yes Yes
Llama 3 Llama3-70B Yes Yes
LlamaForCausalLM aLLaM-13b Yes Yes
Mixtral Mixtral 8x7B Yes Yes
Mistral Mistral-7b Yes Yes

Software requirements

You can use the watsonx-cfm-caikit-1.1 software specification, which is based on the vLLM runtime engine, to deploy your custom foundation model that is fine-tuned with a PEFT technique.

Hardware requirements

Deploying custom foundation models that are fine-tuned with a PEFT technique is a resource-intensive process that requires you to have GPU resources available for deployment.

When you choose the hardware specification to deploy your custom foundation model, you must consider the number of Lora or QLora adapters and the size of each LoRA/QLoRA adapter that you want to deploy in the custom foundation model.

The standard supported hardware configurations to deploy custom foundation models are:

  • NVIDIA A100 80 GB of GPU memory
  • NVIDIA H100 with 80 GB of GPU memory
  • NVIDIA L40S with 48 GB of GPU memory

Supported hardware specifications

Based on the size and the number of parameters used in your model, choose a hardware specification to deploy your fine-tuned custom foundation model.

You can use the following predefined hardware specifications for deployment:

Predefined hardware configurations
Parameters Range Hardware specification Memory available
1B to 20B WX-S 1 GPU, 2 CPU and 60 GB
21B to 40B WX-M 2 GPU, 3 CPU and 120 GB
41B to 80B WX-L 4 GPU, 5 CPU and 240 GB
81B to 200B WX-XL 8 GPU, 9 CPU and 600 GB

Creating customized hardware specifications

You may need to create a customized hardware specification if your model requires a specific configuration not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness. For more information, see Creating custom hardware specifications.

Supported deployment types

You can create an online deployment to deploy custom foundation models that are tuned with PEFT techniques. Online deployment allows for real-time inferencing and is suitable for applications that require low-latency predictions.

Batch deployments are not currently supported for deploying custom foundation models tuned with PEFT.

Learn more

Parent topic: Deploying fine-tuned custom foundation models