Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Learning Rate in 5-2.BERT must be reduced. #77

Open
Cheng0829 opened this issue Sep 22, 2022 · 0 comments
Open

The Learning Rate in 5-2.BERT must be reduced. #77

Cheng0829 opened this issue Sep 22, 2022 · 0 comments

Comments

@Cheng0829
Copy link

Cheng0829 commented Sep 22, 2022

In Line 209:

optimizer = optim.Adam(model.parameters(), lr=0.001)

In practice, this BERT model is bound to fall into local convergence if the learning rate is 0.001;
I think the learning rate should be reduced to 0.0001.
The experimental results show that when the learning rate is 0.0001, after about 100 iterations, the loss value will be reduced to 0.1, while if the learning rate is 0.001, the loss value will almost never be less than 2.0.

when lr=0.01

Epoch: 0010 cost = 15.205759
Epoch: 0020 cost = 16.236261
Epoch: 0030 cost = 18.436878
Epoch: 0040 cost = 4.077913
Epoch: 0050 cost = 12.703120
Epoch: 0060 cost = 10.411244
Epoch: 0070 cost = 1.640913
Epoch: 0080 cost = 10.753708
Epoch: 0090 cost = 8.370532
Epoch: 0100 cost = 1.624577
Epoch: 0110 cost = 8.537676
Epoch: 0120 cost = 7.453298
Epoch: 0130 cost = 1.659591
Epoch: 0140 cost = 7.092763
Epoch: 0150 cost = 6.843360
Epoch: 0160 cost = 1.688111
Epoch: 0170 cost = 6.052425
Epoch: 0180 cost = 6.395712
Epoch: 0190 cost = 1.707749
Epoch: 0200 cost = 5.263054
······
Epoch: 5000 cost = 2.523541

when lr=0.0001

Epoch: 0010 cost = 13.998453
Epoch: 0020 cost = 6.168099
Epoch: 0030 cost = 3.504844
Epoch: 0040 cost = 2.312538
Epoch: 0050 cost = 1.723783
Epoch: 0060 cost = 1.412463
Epoch: 0070 cost = 0.930549
Epoch: 0080 cost = 0.671946
Epoch: 0090 cost = 0.745429
Epoch: 0100 cost = 0.139699
Epoch: 0110 cost = 0.187208
Epoch: 0120 cost = 0.075726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant