Deploying parameter-efficient fine-tuned (PEFT) foundation models
You can use parameter-efficient fine tuning (PEFT) techniques to fine tune pre-trained LLMs without updating the entire model's weights. PEFT methods update a small subset of parameters, typically the model's embeddings or adapters, to adapt the model to a specific task or dataset.
PEFT techniques
There are several types of PEFT techniques, including:
- Low-Rank Adaptation (LoRA):
LoRA is a parameter-efficient fine-tuning technique that adapts a pre-trained language model to a specific task or dataset. It achieves this by updating a low-rank approximation of the model's weight matrix, which provides efficient adaptation without requiring significant updates to the entire model.
LoRA is designed to be a more efficient alternative to traditional fine-tuning methods. By updating only a small subset of the model's parameters, LoRA reduces the computational resources and memory required for fine-tuning. This method is useful for applications where resources are limited.
- Quantized Low-Rank Adaptation (QLoRA):
QLoRA is a variant of LoRA that incorporates quantization to further reduce the memory footprint and computational resources required. By quantizing the weights of the model, QLoRA reduces the precision of the weights from 32-bit floating-point numbers to lower precision, such as 4-bit or 8-bit.
QLoRA is designed to be an even more efficient version of LoRA. By combining the low-rank adaptation of LoRA with quantization, QLoRA achieves significant reductions in memory usage and computational resources. This method is useful for applications where resources are extremely limited.
Methods for deploying PEFT models
You can deploy base foundation models that are hosted by IBM and trained with PEFT techniques. For more information, see Deploying base foundation models fine-tuned with PEFT.
In addition to working with foundation models that are curated by IBM, you can upload and deploy your own custom foundation models. These custom foundation models can be fine-tuned with a Parameter-efficient Fine-Tuning technique and deployed with watsonx.ai. For more information, see Deploying custom foundation models fine-tuned with PEFT.
Learn more
Parent topic: Deploying fine-tuned models