Methods for tuning foundation models

Learn more about different tuning methods in watsonx.ai and how to choose the method that's right for your solution.

Foundation models can be tuned in the following ways:

Full fine tuning

Using the base model’s previous knowledge as a starting point, full fine tuning tailors the model by tuning it with a smaller, task-specific dataset. The full fine-tuning method changes the parameter weights for a model whose weights were set through prior training to customize the model for a task.

A result of full fine tuning is an entirely new model. Because all of the model weights are tuned, full fine tuning is a more expensive technique than other parameter-efficient tuning techniques. More compute and storage resources are required to host the new tuned model that you create by fine tuning a foundation model. See Full fine tuning.

Low-rank adaptation (LoRA) fine tuning

Adapts a foundation model for a task by changing the weights of a representative subset of the model parameters, called low-rank adapters, instead of the base model weights during tuning. At inference time, weights from the tuned adapters are added to the weights from the base foundation model to generate output that is tuned for a task. See Low-rank adaptation (LoRA) fine tuning.

Quantized low-rank adaptation (QLoRA) fine tuning

QLoRA is a variant of LoRA that incorporates quantization to further reduce the memory footprint and computational resources that are required during tuning. See Quantized low-rank adaptation (QLoRA) fine tuning.

Tuning method comparison

The following table compares the available tuning methods based on common criteria for choosing a tuning method.

Foundation model tuning method comparison
Criteria	Full fine tuning	LoRA fine tuning	QLoRA fine tuning
Tuning technique	All base model parameters are fine-tuned on the target task.	Adapters that represent a subset of model parameters are tuned; base model parameters remain fixed during tuning.	Adapters that represent a subset of model parameters are tuned; base model parameters remain fixed during tuning.
Tuned model outcomes	Effective at customizing a model for a new task or domain when given sufficient data and compute resources.	High performance with reduced risk of overfitting; might not reach level of full fine tuning performance.	High performance with reduced risk of overfitting; might not reach level of full fine tuning performance. Potential quality degradation introduced by quantization.
Required compute resources	High. Large computational resources and memory are required to fully update the model parameters.	Moderate. Requires fewer resources than full fine tuning because only the adapters are tuned; the underlying model is unaltered during tuning.	Low. Requires fewer resources than LoRA fine tuning because the model weights are quantized to reduce computational and storage needs.
Tuning time duration	Long. Exact duration depends on the model and dataset sizes.	Moderate. Faster than full fine tuning, but takes time to modify the adapters; can range from one to many hours.	Moderate. Faster than full fine tuning, but takes time to modify the adapters; can range from one to many hours.
Cost	Factor the cost of the extra resources required both to fine tune the model and to deploy and host the new fine-tuned model that is generated.	Requires fewer storage and compute resources. Multiple LoRA adapters can be served using the same base model to save costs.	Requires fewer storage and compute resources than LoRA. Multiple LoRA adapters can be served using the same base quantized model to save costs.
Purpose	Best for scenarios where maximum accuracy and task-specific adaptation are critical and extra resources and cost are justified for the use case.	A good option for creating task-specific adapters to tune a foundation model for multiple tasks.	A good option for creating task-specific adapters to tune a quantized foundation model for multiple tasks.