LoRA

Reading time: 3 min

Low-rank adaptation (LoRA) is parameter-efficient fine-tuning (PEFT) introduced by Hu, Edward J. et al. (2021). The creators of LoRA posited that since trained deep learning models reside in low intrinsic dimensions, perhaps their weight-update matrices do as well.

Specifically, with LoRA, we learn a low-rank representation of the weight-update matrices of dense, linear layers of a pre-trained LLM. The original weights of the LLM are frozen during fine-tuning and only the low-rank weight-update matrices at each step of fine-tuning. This reduction in dimensionality helps to amplify the most important or influential features of the model.

Figure: Illustrating a forward pass with LoRA

Some Math

Let $W$ represent the $d\times d$ weight matrix for a dense, linear layer. We can then loosely represent an updated version (i.e. after fine-tuning) of this matrix as follows:

$$W_{\text{updated}} = W + \Delta W,$$

where $\Delta W$ is the update matrix. With LoRA, it is $\Delta W$ which we project into a low-rank space:

$$\Delta W \approx AB,$$

where $A$ and $B^T$ are both matrices of dimension $d \times r$ and $r << d$. During fine-tuning, $W$ is frozen and only $A$ and $B$ are updated.

For inference (i.e., forward phase), let $x$ be an input embedding, then by the distributive property

$$xW_{\text{updated}} = xW + x\Delta W \approx xW + xAB.$$

Implementation Details

One modular implementation of LoRA involves the introduction of a LoRALayer that comprises of only the $A$ and $B$ dense weights. In this way, a LoRALayer can adapt a pre-trained Linear layer.

import torch


class LoRALayer(torch.nn.Module):
    """A basic LoRALayer implementation."""

    def __init__(self, d_in: int, d_out: int, rank: int):
        self.A = torch.nn.Parameter(torch.empty(d_in, rank))
        self.B = torch.nn.Parameter(torch.zeros(rank, d_out))

    def forward(self, x):
        return x @ self.A @ self.B

With the LoRALayer defined in this way, one can then combine this with a Linear layer to implement the LoRA technique. See the supplementary Colab notebook linked at the top of this pocket reference for more details.

Performance

In the original paper, the authors reported similar levels of performance when using LoRA compared to full fine-tuning on various natural language generation and understanding tasks.

Additional Benefits

Since LoRA matrices can be stored efficiently and separately from the pre-trained LLM weights, customization of these large models is highly scalable. Organizations can build libraries of specialized LoRA matrices for different datasets and domains, switching between them as needed for specific applications.

Limitations

References & Useful Links

Contributors: