feat: Add LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning #4321

snowmead · 2026-01-14T04:31:16Z

Checklist

Confirmed that cargo run-checks command has been executed.
Made sure the book is up to date with changes in this PR.

Related Issues/PRs

Changes

Added a new lora module in the burn-nn crate which implements a considerable set of features inspired by the PEFT library.

Features

Training LoRA adapters on linear layers with frozen base weights
Configurable rank, alpha scaling, and dropout
RSLoRA scaling option (alpha / sqrt(rank) instead of alpha / rank)
Multiple initialization methods: Kaiming, Gaussian, Zeros
Bias training modes: None (fully frozen) or LoraOnly (train bias only)
Merging LoRA weights into base layer for zero-overhead inference
Saving adapters (LoRA matrices + config) independently of base model
Loading adapters onto fresh base models
Runtime adapter swapping without reloading base weights

Not implemented for the sake of concision

I figure these can be implemented in separate PRs and this particular change is to layout the foundations.

Layers: Only Linear supported (Conv, Embedding)
Variants: No QLoRA, DoRA, LoRA+, or AdaLoRA
Multi-adapter: No fusion or simultaneous adapters
Convenience: No auto-wrap by pattern or architecture presets

Testing

An example crate examples/lora-finetuning/ is implemented to test the features and are visualized using the existing SupervisedTraining implementation.

Non-exhaustive list of features tested:

i. Saving adapters which saves lora matrices and the configuration
ii. Loading adapters which load the lora matrices and the configurations and apply them to the base model to train
iii. Roundtrip verification ensuring that a model loaded from disk (base model + adapters) produces identical inference results to the original trained mode

Implements LoRA module in burn-nn enabling efficient fine-tuning of large models by adding trainable low-rank matrices while keeping base weights frozen. Key additions: - LoRA layer support for Linear via `LoraAdaptable` trait - Adapter persistence for saving/loading trained LoRA weights independently - Comprehensive configuration (rank, alpha, dropout, bias modes, RSLoRA) - `merge()` for zero-overhead inference after training - burn-book documentation and lora-finetuning example Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

snowmead · 2026-01-14T05:01:13Z

Open questions:

i. Unsure if the lora section in the burn-book should be under advanced - "burn-book/src/advanced/lora.md"

ii. Should LoraConfig instead accept an Initializer as the init field type for complete flexibility and control or should I keep the LoraInit enum?

pub struct LoraConfig {
      pub init_a: Initializer,
      pub init_b: Initializer,
      // ...
  }

nathanielsimard

I believe we should implement a more "generalized" approach to LoRA. Specifically, the logic for the update should reside within the Param<Tensor<B, 2>> type itself. In the current Linear layer implementation, parameters are retrieved using the self.weights.val() function. If this function were responsible for performing the weight update, calculating $W = X + AB$, then any module using 2D tensors as parameters could support LoRA adaptations natively, without requiring modifications to the existing module logic.

snowmead · 2026-01-14T15:09:42Z

@nathanielsimard

I believe we should implement a more "generalized" approach to LoRA. Specifically, the logic for the update should reside within the Param<Tensor<B, 2>> type itself. In the current Linear layer implementation, parameters are retrieved using the self.weights.val() function. If this function were responsible for performing the weight update, calculating W = X + A B , then any module using 2D tensors as parameters could support LoRA adaptations natively, without requiring modifications to the existing module logic.

Yes I assumed this would be another approach. So do you mean something like this?

let output = self.weight.val();  // Returns W + A @ B * scaling (if LoRA attached/enabled)

Would lora matrices/Params and configurations be nested inside the base Param, since having this one layer up would not be generalized as you describe.

Something like this?

 pub struct Param<T: Parameter> {
      pub id: ParamId,
      // Base weight W (frozen)
      state: OnceCell<T>,                    

      // LoRA state
      lora_a: Option<Param<T>>,
      lora_b: Option<Param<T>>,
      lora_scaling: f64,

      // ... other fields
  }

But this would drastically change the visiting logic traversal, and would turn this into a recursive type which is probably a terrible idea.

Did you have something clever/cleaner in mind?

nathanielsimard · 2026-01-14T20:04:36Z

Something like this?

 pub struct Param<T: Parameter> {
      pub id: ParamId,
      // Base weight W (frozen)
      state: OnceCell<T>,                    

      // LoRA state
      lora_a: Option<Param<T>>,
      lora_b: Option<Param<T>>,
      lora_scaling: f64,

      // ... other fields
  }

But this would drastically change the visiting logic traversal, and would turn this into a recursive type which is probably a terrible idea.

Did you have something clever/cleaner in mind?

I would not make it recursive, but yes I would put extra tensors in the paramter, but hardcoded for simplicity:

  trait Parameter {
     type LoraParams
  }
  pub struct Param<T: Parameter> {
       pub id: ParamId,
       // Base weight W (frozen)
       state: OnceCell<T>,                    
 
       // LoRA state
       loras: Option<P::LoraParams>,
       // ... other fields
   }

Then when we implement parameter for Tensor we can select the right lora parameters.
I'm not saying this is an easy feature to implement, but if we can make it work, the generalization is great, users won't have to modify their modules to support Lora training.

snowmead · 2026-01-14T22:07:00Z

@nathanielsimard

Something like this?
 pub struct Param<T: Parameter> {
      pub id: ParamId,
      // Base weight W (frozen)
      state: OnceCell<T>,                    

      // LoRA state
      lora_a: Option<Param<T>>,
      lora_b: Option<Param<T>>,
      lora_scaling: f64,

      // ... other fields
  }
But this would drastically change the visiting logic traversal, and would turn this into a recursive type which is probably a terrible idea.
Did you have something clever/cleaner in mind?
I would not make it recursive, but yes I would put extra tensors in the paramter, but hardcoded for simplicity:
  trait Parameter {
     type LoraParams
  }
  pub struct Param<T: Parameter> {
       pub id: ParamId,
       // Base weight W (frozen)
       state: OnceCell<T>,                    
 
       // LoRA state
       loras: Option<P::LoraParams>,
       // ... other fields
   }
Then when we implement parameter for Tensor we can select the right lora parameters. I'm not saying this is an easy feature to implement, but if we can make it work, the generalization is great, users won't have to modify their modules to support Lora training.

Completely agree with you, I attempted this originally because I thought maybe it wouldn't be such a burden for users, but also because I didn't think we would have wanted LoRA embedded in the Parameters directly since I believed the Parameter shouldn't know about any particular training methods.

I will look into the architecture a little more in the direction you were thinking, devise something feasible with the same level of control and features so we can discuss about it further before implementation.

Thanks for your input.

snowmead force-pushed the feat/lora-finetuning branch from b667004 to b05cbb4 Compare January 14, 2026 04:35

snowmead marked this pull request as ready for review January 14, 2026 05:30

snowmead mentioned this pull request Jan 14, 2026

LoRA Wrapper for Existing Models #2943

Open

nathanielsimard requested changes Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning #4321

feat: Add LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning #4321

Uh oh!

snowmead commented Jan 14, 2026 •

edited

Loading

Uh oh!

snowmead commented Jan 14, 2026

Uh oh!

nathanielsimard left a comment

Uh oh!

snowmead commented Jan 14, 2026 •

edited

Loading

Uh oh!

nathanielsimard commented Jan 14, 2026

Uh oh!

snowmead commented Jan 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning #4321

Are you sure you want to change the base?

feat: Add LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning #4321

Uh oh!

Conversation

snowmead commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Related Issues/PRs

Changes

Features

Not implemented for the sake of concision

Testing

Uh oh!

snowmead commented Jan 14, 2026

Uh oh!

nathanielsimard left a comment

Choose a reason for hiding this comment

Uh oh!

snowmead commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathanielsimard commented Jan 14, 2026

Uh oh!

snowmead commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snowmead commented Jan 14, 2026 •

edited

Loading

snowmead commented Jan 14, 2026 •

edited

Loading

snowmead commented Jan 14, 2026 •

edited

Loading