Requirements for deployment

Review supported model architectures, software requirements, and hardware requirements for deploying custom foundation models that are fine-tuned with Parameter-Efficient Fine-Tuning (PEFT) techniques.

Supported model architectures

You can deploy custom foundation models that are fine-tuned with PEFT techniques for the following model architectures:

Supported model architectures and fine-tuning techniques
Model architecture	Model name and size	LoRA	QLoRA
`GraniteForCausalLM`	`Granite PowerLM 3B` `Granite 3.1 1B` `Granite 3.1 2B` `Granite 3.1 3B` `Granite 3.1 8B` `Granite 3.0 2B` `Granite 3.0 8B`	Yes	Yes
`GraniteMoeForCausalLM`	`GraniteMoE 1B` `GraniteMoE 3B`	Yes	N/A
`LlamawithCausalLM`	`Granite 3B` `Granite 8B`	Yes	Yes
`GPTBigCodeForCausalLM`	`Granite 13B` `Granite 20B`	Yes	Yes
`GPTBigCodeForCausalLM`	`Granite 34B`	Yes	Yes
`Llama 3.1`	`Llama3.1-8B`	Yes	Yes
`Llama 3.1`	`Llama3.1-70B`	Yes	Yes
`Llama 3.1`	`Llama3.1-405B`	No	Yes
`Llama 3`	`Llama3-8B`	Yes	Yes
`Llama 3`	`Llama3-70B`	Yes	Yes
`LlamaForCausalLM`	`aLLaM-13b`	Yes	Yes
`Mixtral`	`Mixtral 8x7B`	Yes	Yes
`Mistral`	`Mistral-7b`	Yes	Yes

Software requirements

You can use the watsonx-cfm-caikit-1.1 software specification, which is based on the vLLM runtime engine, to deploy your custom foundation model that is fine-tuned with a PEFT technique.

Hardware requirements

Deploying custom foundation models that are fine-tuned with a PEFT technique is a resource-intensive process that requires you to have GPU resources available for deployment.

When you choose the hardware specification to deploy your custom foundation model, you must consider the number of Lora or QLora adapters and the size of each LoRA/QLoRA adapter that you want to deploy in the custom foundation model.

The standard supported hardware configurations to deploy custom foundation models are:

NVIDIA A100 80 GB of GPU memory
NVIDIA H100 with 80 GB of GPU memory
NVIDIA L40S with 48 GB of GPU memory

Supported hardware specifications

Based on the size and the number of parameters used in your model, choose a hardware specification to deploy your fine-tuned custom foundation model.

You can use the following predefined hardware specifications for deployment:

Predefined hardware configurations
Parameters Range	Hardware specification	Memory available
1B to 20B	WX-S	1 GPU, 2 CPU and 60 GB
21B to 40B	WX-M	2 GPU, 3 CPU and 120 GB
41B to 80B	WX-L	4 GPU, 5 CPU and 240 GB
81B to 200B	WX-XL	8 GPU, 9 CPU and 600 GB

Creating customized hardware specifications

You may need to create a customized hardware specification if your model requires a specific configuration not met by predefined specs, its size doesn't fit predefined ranges, or you want to optimize performance or cost-effectiveness. For more information, see Creating custom hardware specifications.

Supported deployment types

You can create an online deployment to deploy custom foundation models that are tuned with PEFT techniques. Online deployment allows for real-time inferencing and is suitable for applications that require low-latency predictions.

Batch deployments are not currently supported for deploying custom foundation models tuned with PEFT.

Learn more

Creating customized hardware specifications

Parent topic: Deploying fine-tuned custom foundation models