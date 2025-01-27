Foundation models and large language models (LLMs) such as OpenAIs GPT, LLaMa 2, Claude and others, have dramatically advanced the state of the art. These models have exceptional capabilities in applications from text generation to language understanding that require large quantities of model parameters. For example, GPT-3 has 175 billion. Parameters are numbers that are stored in matrices and storing large quantities of parameters requires high memory usage. This storage requirement presents problems when fine-tuning. A full fine-tuning process indicates that all the parameters will be trained, requiring an extraordinary amount of compute resources. Because of this need, fine-tuning large language models are sometimes out of reach for teams without significant resources.

Low-rank adaptation (LoRA) is a technique used to adapt machine learning models to new contexts. It can adapt large models to specific uses by adding lightweight pieces to the original model rather than changing the entire model. A data scientist can quickly expand the ways that a model can be used rather than requiring them to build an entirely new model.

The technique was originally published by Edward Hu, Yelong Shen and collaborators in their paper LoRA: Low-Rank Adaptation Of Large Language Models 1. In the paper, they showed that models reduced and retrained using LoRA outperformed base models on a variety of benchmark tasks. The model performance might be improved without requiring full fine-tuning and by using a significantly smaller number of trainable parameters.