My IBM

What is parameter-efficient fine-tuning (PEFT)?

15 August 2024

Authors

What is parameter-efficient fine-tuning (PEFT)?

Parameter-efficient fine-tuning (PEFT) is a method of improving the performance of pretrained large language models (LLMs) and neural networks for specific tasks or data sets. By training a small set of parameters and preserving most of the large pretrained model’s structure, PEFT saves time and computational resources.

Neural networks trained for general tasks such as natural language processing (NLP) or image classification can specialize in a related new task without being entirely retrained. PEFT is a resource-efficient way to build highly specialized models without starting from scratch each time.

How does parameter-efficient fine-tuning work?

PEFT works by freezing most of the pretrained language model’s parameters and layers while adding a few trainable parameters, known as adapters, to the final layers for predetermined downstream tasks.

The fine-tuned models retain all the learning gained during training while specializing in their respective downstream tasks. Many PEFT methods further enhance efficiency with gradient checkpointing, a memory-saving technique that helps models learn without storing as much information at once.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Subscribe today

Why is parameter-efficient fine-tuning important?

Parameter-efficient fine-tuning balances efficiency and performance to help organizations maximize computational resources while minimizing storage costs. When tuned with PEFT methods, transformer-based models such as GPT-3, LLaMA and BERT can use all the knowledge contained in their pretraining parameters while performing better than they otherwise would without fine-tuning.

PEFT is often used during transfer learning, where models trained in one task are applied to a second related task. For example, a model trained in image classification might be put to work on object detection. If a base model is too large to completely retrain or if the new task is different from the original, PEFT can be an ideal solution.

Mixture of Experts | 25 April, episode 52

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch the latest podcast episodes

PEFT versus fine-tuning

Traditional full fine-tuning methods involve slight adjustments to all the parameters in pretrained LLMs to adapt them for specific tasks. But as developments in artificial intelligence (AI) and deep learning (DL) have led models to grow larger and more complex, the fine-tuning process has become too demanding on computational resources and energy.

Also, each fine-tuned model is the same size as the original. All these models take up significant amounts of storage space, further driving up costs for the organizations that use them. While fine-tuning does create more efficient machine learning (ML), the process of fine-tuning LLMs has itself become inefficient.

PEFT adjusts the handful of parameters that are most relevant to the model’s intended use case to deliver specialized model performance while reducing model weights for significant computational cost and time savings.

PEFT benefits

Parameter-efficient fine-tuning brings a wealth of benefits that have made it popular with organizations that use LLMs in their work:

Increased efficiency

Most large language models used in generative AI (gen AI) are powered by expensive graphics processing units (GPUs) made by manufacturers such as Nvidia. Each LLM uses large amounts of computational resources and energy. Adjusting only the most relevant parameters imparts large savings on energy and cloud computing costs.

Faster time-to-value

Time-to-value is the amount of time that it takes to develop, train and deploy an LLM so it can begin generating value for the organization that uses it. Because PEFT tweaks only a few trainable parameters, it takes far less time to update a model for a new task. PEFT can deliver comparable performance to a full fine-tuning process at a fraction of the time and expense.

No catastrophic forgetting

Catastrophic forgetting happens when LLMs lose or “forget” the knowledge gained during the initial training process as they are retrained or tuned for new use cases. Because PEFT preserves most of the initial parameters, it also safeguards against catastrophic forgetting.

Lower risk of overfitting

Overfitting is when a model hews too closely to its training data during the training process, making it unable to generate accurate predictions in other contexts. Transformer models tuned with PEFT are much less prone to overfitting as most of their parameters remain static.

Lower data demands

By focusing on a few parameters, PEFT lowers the training data requirements for the fine-tuning process. Full fine-tuning requires a much larger training data set because all the model’s parameters will be adjusted during the fine-tuning process.

More accessible AI

Without PEFT, the costs of developing a specialized LLM are too high for many smaller or medium-sized organizations to bear. PEFT makes LLMs available to teams who might not otherwise have the time or resources to train and fine-tune models.

More flexible AI

PEFT enables data scientists and other professionals to customize general LLMs to individual use cases. AI teams can experiment with model optimization without worrying as much about burning through computational, energy and storage resources.

PEFT techniques

AI teams have various PEFT techniques and algorithms at their disposal, each with its relative advantages and specializations. Many of the most popular PEFT tools can be found on Hugging Face and numerous other GitHub communities.

Adapters

Adapters are one of the first PEFT techniques to be applied to natural language processing (NLP) models. Researchers strove to overcome the challenge of training a model for multiple downstream tasks while minimizing model weights. Adapter modules were the answer: small add-ons that insert a handful of trainable, task-specific parameters into each transformer layer of the model.

LoRA

Introduced in 2021, low-rank adaption of large language models (LoRA) uses twin low-rank decomposition matrices to minimize model weights and reduce the subset of trainable parameters even further.

QLoRA

QLoRA is an extended version of LoRA that quantizes or standardizes the weight of each pretrained parameter to just 4 bits from the typical 32-bit weight. As such, QLoRA offers significant memory savings and makes it possible to run an LLM on just one GPU.

Prefix-tuning

Specifically created for natural language generation (NLG) models, prefix-tuning appends a task-specific continuous vector, known as a prefix, to each transformer layer while keeping all parameters frozen. As a result, prefix-tuned models store over a thousandfold fewer parameters than fully fine-tuned models with comparable performance.

Prompt-tuning

Prompt-tuning simplifies prefix-tuning and trains models by injecting tailored prompts into the input or training data. Hard prompts are manually created, while soft prompts are AI-generated strings of numbers that draw knowledge from the base model. Soft prompts have been found to outperform human-generated hard prompts during tuning.

P-tuning

P-tuning is a variation of prompt-tuning designed for natural language understanding (NLU) tasks. Rather than use manually created prompts, P-tuning introduced automated prompt training and generation that leads to more impactful training prompts over time.

Unlock the power of generative AI + ML

Learn how to confidently incorporate generative AI and machine learning into your business.

Resources

The CEO's guide to generative AI

Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.

Take your gen AI skills to the next level

Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.

Put AI to work: Driving ROI with gen AI

Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.

watsonx Developer Hub

Support your next project with some of our most commonly used capabilities. Get started and learn more about the supported models that IBM provides.

The truth about successful generative AI

Uncover the benefits of AI platforms that enable foundation model customization through technology, processes, and best practices, to help you easily operationalize the genAI lifecycle.

AI in Action 2024

We surveyed 2,000 organizations about their AI initiatives to discover what’s working, what’s not and how you can get ahead.

Explore IBM Granite

IBM® Granite™ is our family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

How to thrive in this new era of AI with trust and confidence

Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Take the next step

Get one-stop access to capabilities that span the AI development lifecycle. Produce powerful AI solutions with user-friendly interfaces, workflows and access to industry-standard APIs and SDKs.

Explore watsonx.ai

Book a live demo

What is parameter-efficient fine-tuning (PEFT)?

15 August 2024

Authors

Cole Stryker

Ivan Belcic

What is parameter-efficient fine-tuning (PEFT)?

How does parameter-efficient fine-tuning work?

The latest AI News + Insights

Why is parameter-efficient fine-tuning important?

Decoding AI: Weekly News Roundup

PEFT versus fine-tuning

PEFT benefits

Increased efficiency

Faster time-to-value

No catastrophic forgetting

Lower risk of overfitting

Lower data demands

More accessible AI

More flexible AI

PEFT techniques

Adapters

LoRA

QLoRA

Prefix-tuning

Prompt-tuning

P-tuning

Related solutions

Resources