Question

In Parameter-Efficient Fine-Tuning (PEFT) using Low-Rank Adaptation (LoRA), which specific structural change is made to the pre-trained weight matrices to reduce the number of trainable parameters?

Accepted Answer

In LoRA, the pre-trained weight matrix, which is a large two-dimensional grid of numbers representing the model&#x27;s learned knowledge, remains frozen and is never updated. To introduce new information without changing these original weights, LoRA adds a parallel path consisting of two much smaller matrices called A and B. If the original weight matrix has a dimension of D by D, the product of matrix A and matrix B is designed to have the same D by D dimension, but they are constructed using a much smaller inner dimension called the rank, denoted as r. For example, if the original matrix is 1000 by 1000, it contains one million parameters. If you choose a rank of 8, matrix A would be 8 by 1000 and matrix B would be 1000 by 8, resulting in only 16,000 parameters. During fine-tuning, the model only calculates and stores changes for these two smaller matrices. When the model performs a calculation, it takes the output from the frozen original matrix and adds it to the output produced by the new A times B path. This process, known as low-rank decomposition, effectively mimics the update to a large matrix while using only a tiny fraction of the original parameter count.

Home → All Courses → Engineering and Technology Courses → Generative AI Application Development → Flashcard

In Parameter-Efficient Fine-Tuning (PEFT) using Low-Rank Adaptation (LoRA), which specific structural change is made to the pre-trained weight matrices to reduce the number of trainable parameters?