Methods for tuning foundation models
Learn more about different tuning methods in watsonx.ai and how to choose the method that's right for your solution.
Foundation models can be tuned in the following ways:
- Full fine tuning
-
Using the base model’s previous knowledge as a starting point, full fine tuning tailors the model by tuning it with a smaller, task-specific dataset. The full fine-tuning method changes the parameter weights for a model whose weights were set through prior training to customize the model for a task.
A result of full fine tuning is an entirely new model. Because all of the model weights are tuned, full fine tuning is a more expensive technique than other parameter-efficient tuning techniques. More compute and storage resources are required to host the new tuned model that you create by fine tuning a foundation model. See Full fine tuning.
- Low-rank adaptation (LoRA) fine tuning
-
Adapts a foundation model for a task by changing the weights of a representative subset of the model parameters, called low-rank adapters, instead of the base model weights during tuning. At inference time, weights from the tuned adapters are added to the weights from the base foundation model to generate output that is tuned for a task. See Low-rank adaptation (LoRA) fine tuning.
- Quantized low-rank adaptation (QLoRA) fine tuning
-
QLoRA is a variant of LoRA that incorporates quantization to further reduce the memory footprint and computational resources that are required during tuning. See Quantized low-rank adaptation (QLoRA) fine tuning.
Tuning method comparison
The following table compares the available tuning methods based on common criteria for choosing a tuning method.
| Criteria | Full fine tuning | LoRA fine tuning | QLoRA fine tuning |
|---|---|---|---|
| Tuning technique | All base model parameters are fine-tuned on the target task. | Adapters that represent a subset of model parameters are tuned; base model parameters remain fixed during tuning. | Adapters that represent a subset of model parameters are tuned; base model parameters remain fixed during tuning. |
| Tuned model outcomes | Effective at customizing a model for a new task or domain when given sufficient data and compute resources. | High performance with reduced risk of overfitting; might not reach level of full fine tuning performance. | High performance with reduced risk of overfitting; might not reach level of full fine tuning performance. Potential quality degradation introduced by quantization. |
| Required compute resources | High. Large computational resources and memory are required to fully update the model parameters. | Moderate. Requires fewer resources than full fine tuning because only the adapters are tuned; the underlying model is unaltered during tuning. | Low. Requires fewer resources than LoRA fine tuning because the model weights are quantized to reduce computational and storage needs. |
| Tuning time duration | Long. Exact duration depends on the model and dataset sizes. | Moderate. Faster than full fine tuning, but takes time to modify the adapters; can range from one to many hours. | Moderate. Faster than full fine tuning, but takes time to modify the adapters; can range from one to many hours. |
| Cost | Factor the cost of the extra resources required both to fine tune the model and to deploy and host the new fine-tuned model that is generated. | Requires fewer storage and compute resources. Multiple LoRA adapters can be served using the same base model to save costs. | Requires fewer storage and compute resources than LoRA. Multiple LoRA adapters can be served using the same base quantized model to save costs. |
| Purpose | Best for scenarios where maximum accuracy and task-specific adaptation are critical and extra resources and cost are justified for the use case. | A good option for creating task-specific adapters to tune a foundation model for multiple tasks. | A good option for creating task-specific adapters to tune a quantized foundation model for multiple tasks. |