### VeLoRA

> [!NOTE]
> This is a variant of LoRA and therefore everything that is possible with LoRA is valid for this method except otherwise stated on this page.

[VeLoRA](https://huggingface.co/papers/2405.17991) is a LoRA variant that reduces training memory by compressing the activations saved for the LoRA in the forward pass and then reconstructing them in the backwards pass to implement the update rules. In PEFT, VeLoRA is configured as a LoRA variant through the `velora_config` argument on [LoraConfig](/docs/peft/main/en/package_reference/lora#peft.LoraConfig).

```py
from peft import LoraConfig, VeloraConfig

config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    velora_config=VeloraConfig(
        num_groups=64,
        scale=0.2,
        init_type="batch_average",
    ),
)
```

VeLoRA is applied to every LoRA layer selected by `target_modules`. `num_groups` controls how the input activation depth is split before compression. If the activation depth is not evenly divisible by `num_groups`, VeLoRA pads the grouped representation internally and removes the padding after reconstruction. `scale` rescales the reconstructed activations during the backward pass, and `init_type` chooses how the projection is initialized.

Use `batch_average_once` to initialize the projection from the first training batch, `batch_average` to update it from every training forward pass, or `random` to initialize it immediately from a random normalized vector.

Below are some results with the [MetaMathQA benchmark](https://github.com/huggingface/peft/tree/main/method_comparison/MetaMathQA).

| Variant | Training Loss | Max Memory (GiB) | Tokens/sec |
|---|---:|---:|---:|
| LoRA | 0.5427 | 27.69 | 2366.2 |
| LoRA + GC | 0.5426 | 13.17 | 1671.8 |
| LoRA+VeLoRA | 0.5427 | 19.94 | 2057.6 |

#### Caveats

- VeLoRA is currently supported on standard LoRA linear layers only.

