Concrete weight decay configuration for GPT-2 pretraining #40

DesperateExplorer · 2023-08-31T19:39:37Z

Dear authors:

According to the README.md of this amazing project, the weight_decay param should be 0.02, while in the configuration file attached in #32, the WD seems to be 0.05. Also, only beta3 is explicitly specified in the aforementioned configuration file, I can only inspect from https://github.com/sail-sg/Adan/blob/main/gpt2/README.md that

beta1 = 0.98
beta2 = 0.92

However, weight_decay=0.02 together with the other hyperparams above yields an inferior val loss curve compared with (that of the AdamW baseline)[https://github.com/karpathy/nanoGPT/blob/master/config/train_gpt2.py]. Thus, do you have any suggestion about the hyperparams I mentioned? Thanks!

The text was updated successfully, but these errors were encountered:

XingyuXie · 2023-09-01T06:34:33Z

The above is a comparison between Adam and Adan on GPT-2 345M pre-trained on the OpenwebText dataset. As you mentioned, you may consider referring to the config in #32, and there is no need to tune beta1 and beta2. Using the default value is okay.

The most sensitive hyperparam is the lr and wd; you can choose wd from [0.02, 0.05, 0.1], beta3 can be chosen from [0.95, 0.999], and a larger lr and warmup fraction for Adan. We all follow this rule to tune the parameter for the 7B and even 65B models.

If you still get an inferior, I may reconstruct your experiment from my side.

XingyuXie closed this as completed Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concrete weight decay configuration for GPT-2 pretraining #40

Concrete weight decay configuration for GPT-2 pretraining #40

DesperateExplorer commented Aug 31, 2023 •

edited

Loading

XingyuXie commented Sep 1, 2023

Concrete weight decay configuration for GPT-2 pretraining #40

Concrete weight decay configuration for GPT-2 pretraining #40

Comments

DesperateExplorer commented Aug 31, 2023 • edited Loading

XingyuXie commented Sep 1, 2023

DesperateExplorer commented Aug 31, 2023 •

edited

Loading