Reparameterization-based methods like Low Rank Adaptation (LoRA) leverage low-rank transformation of high-dimensional matrices (like the massive matrix of pre-trained model weights in a transformer model). These low-rank representations omit inconsequential higher-dimensional information in order to capture the underlying low-dimensional structure of model weights, greatly reducing the number of trainable parameters. This dramatically speeds up fine-tuning and reduces memory needed to store model updates.
LoRA eschews direct optimization of the matrix of model weights and instead optimizes a matrix of updates to model weights (or delta weights), which is inserted into the model. That matrix of weight updates is, in turn, represented as two smaller (i.e., lower rank) matrices, greatly reducing the number of parameters to be updated—which, in turn, dramatically speeds up fine-tuning and reduces memory needed to store model updates. The pre-trained model weights themselves remain frozen.
An added benefit of LoRA is that, since what’s being optimized and stored are not new model weights but rather the difference (or delta) between the original pre-trained weights and fine-tuned weights, different task-specific LoRAs can be “swapped in” as needed to adapt the pre-trained model—whose actual parameters remain unchanged—to a given use case.
A variety of LoRA derivatives has been developed, such as QLoRA, which further reduces computational complexity by quantizing the transformer model prior to LoRA.