LoRA: Low-Rank Adaptation of Large Language Models

J. Edward Hu, Yelong Shen, Phillip Wallis et al.

2021 · arXiv (Cornell University) · 2,438 citations

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, gre…

Read the paper →

Explore this paper's citation graph on Constellation.