-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BertGeneration training yields "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn" #578
Comments
It seems like this is caused by the trainer freezing the model weights and not adding adapter modules other than invertibles to the EncoderDecoderMixin. The trainer will not get any parameters with requires_grad=True for the optimizer over here. I tried to add these some layers to the Mixin like done in the BARTMixin without success:
|
Thanks for reporting these issues and sorry for not getting back to you earlier. Unfortunately, our current encoder-decoder implementation is very hacky and has all sorts of issues currently. We'll try to look into this. |
I am wondering, are there any other updates on this problem? |
Environment info
adapter-transformers
version: 3.2.1Details
I'm trying to train a EncoderDecoder Adapter with BertGeneration using the Seq2SeqAdapterTrainer:
This results in a RuntimeError:
This issue is also present in transformers, where a change of the optimizer is discussed as a solution. Changing the optimizer in the Seq2SeqAdapterTrainer did not work:
Training with the same settings without adapter and Seq2SeqTrainer does work.
The text was updated successfully, but these errors were encountered: