Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different outputs when upgrading from adapter-transformers with LoRA #760

Closed
2 of 4 tasks
jblamare opened this issue Nov 19, 2024 · 3 comments · Fixed by #770
Closed
2 of 4 tasks

Different outputs when upgrading from adapter-transformers with LoRA #760

jblamare opened this issue Nov 19, 2024 · 3 comments · Fixed by #770
Assignees
Labels
bug Something isn't working

Comments

@jblamare
Copy link

Environment info

  • adapters version: 1.0.1
  • transformers version: 4.45.2
  • Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: not installed
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: Yes
  • GPU type: NVIDIA RTX A6000

Information

Model I am using (Bert, XLNet ...): google/flan-t5-small

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): LoRA

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

I have two environments:

  • env1
adapter-transformers==3.1.0
torch==1.13.1
  • env2
adapters==1.0.1
torch==2.5.1
transformers==4.45.2

I have some input_ids

input_ids = [262, 4, 4815, 10, 8668, 3, 5359, 27415, 5332, 3430, 276, 3577, 20186, 11951, 8472, 11359, 4209, 205, 20931, 23936, 3388, 27447, 8015]

I have a model checkpoint checkpoint.pth which has the T5 weights plus a LoRA adapter and was saved in env1:

with open("checkpoint.pth"), "wb") as f:
    torch.save(model.state_dict(), f)

From there I want to make sure I can load the model, run inference, and get the same outputs in env2. But the outputs are different. I run the following experiments:

  1. Create a T5 model, add the empty LoRA adapter, run inference - env1 and env2 get to the same output
  2. Create a T5 model, load the non-LoRA weights, run inference - env1 and env2 get to the same output
  3. Create a T5 model, add the LoRA adapter, load all the weights, run inference - env1 has the right output but env2 is different.

Here is the code I use (in env1 I just remove import adapters, adapters.init(model), and use adapter_config = transformers.adapters.LoRAConfig(r=8, alpha=16):

import adapters
import torch
import transformers

input_ids = [262, 4, 4815, 10, 8668, 3, 5359, 27415, 5332, 3430, 276, 3577, 20186, 11951, 8472, 11359, 4209, 205, 20931, 23936, 3388, 27447, 8015]
model = transformers.AutoModel.from_pretrained("google/flan-t5-small")
adapters.init(model)
adapter_config = adapters.LoRAConfig(r=8, alpha=16)
model.add_adapter("ct", config=adapter_config)
model.set_active_adapters("ct")
model = model.encoder
checkpoint = torch.load("checkpoint.pth", map_location=torch.device("cpu"))
model.load_state_dict(checkpoint, strict=False)
model = model.eval()
outputs = model(input_ids=torch.IntTensor([input_ids]))

Unfortunately I can't share the model weights. Any thoughts on what might be the reason I get different outputs only if I use LoRA and load my weights?

Expected behavior

Getting the same output in env1 and env2.

@jblamare jblamare added the bug Something isn't working label Nov 19, 2024
@jblamare
Copy link
Author

jblamare commented Nov 19, 2024

After diving into the codebase, I think I understand the difference. This looks like a bug in the LoRALinear class, but I might be missing something.

  • In adapter-transformers, the linear output and the delta are combined with result = lora.com(result, delta_w, scaling=gate). In particular, if lora.use_gating == False then gate is None which means the scaling used is 16/8=2 in my case.
  • In adapters, they are combined with scaling=1.0. There is a comment saying "scaling already applied in compose" but I don't think it is? compose will run compose_stack which will run LoRALinear's compose_single which will run LoRA's forward which does not involve any scaling.

Am I missing something?

@jblamare jblamare changed the title Different outputs when upgrading from adapter-transformers with T5, LoRA, and loaded weights Different outputs when upgrading from adapter-transformers with LoRA Nov 19, 2024
@calpt calpt self-assigned this Dec 22, 2024
@calpt
Copy link
Member

calpt commented Dec 22, 2024

Hey @jblamare,

Thank you so much for reporting this and for providing the detailed investigation! I was able to reproduce the issues and believe your suspicion that this is a bug in our current implementation is correct. The application of default scaling unfortunately got lost in a recent larger refactoring of the lora code.

I think the proper way to re-add it would be in the LoRA module forward, like this: calpt@95be3cf. This removed the output diff between adapter-transformers & adapters for me, you might want to check on your side as well. I'll work on patching it in the main code.

Thanks again for bringing this up!

@jblamare
Copy link
Author

Hi @calpt, thanks a lot for looking into this! I can confirm that I've tested your change and it works for me. I'll look out for the next release!

calpt added a commit that referenced this issue Dec 29, 2024
Resolves issue described in #760.

**IMPORTANT**: this fix restores weights compatibility with
adapter-transformers. Compatibility to previous adapters versions is
kept via a compat patch.

## Details

The current implementation of LoRA/ (IA)^3 in `adapters ` versions <
1.1.0 does not correctly implement adapter states scaling via the LoRA
`alpha` attribute, effectively ignoring `alpha` and always applying a
scaling of 1.0.
This PR restores the correct original behavior as found in
adapter-transformers/ original LoRA implementation.

As this change breaks all adapters pre-trained using `adapters` versions
0.1.0 - 1.0.1, a backward compatibility patch is introduced that
automatically sets `alpha = r` for LoRAs for adapters that were trained
using affected versions. This ensures all previous adapters continue to
behave exactly as trained (ie give the exact same output using newer
versions).

---------

Co-authored-by: TimoImhof <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants