-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeze Weights #22
Comments
We did not test this thoroughly for every downstream task, but for secondary structure we generally saw 1-2 percentage points of improvement when fine-tuning the whole model. I suspect the difference will depend a great deal on the task.
… On Jun 12, 2020, at 6:40 AM, spark157 ***@***.***> wrote:
Hello,
I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.
Thanks again!
Scott
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#22>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABRSCXKLIJD5DRSX7OSJ7ADRWIV4JANCNFSM4N4K2YDQ>.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I can see from the Training Details in the paper that during supervised fine-tuning backpropagation was through the entire model including the language model portion. I also see from the code that you had some functionality for freezing weights. I was curious what magnitude you saw between freezing or training the language model portion during the supervised fine-tuning if you did that, especially for the Transformer.
Thanks again!
Scott
The text was updated successfully, but these errors were encountered: