Training Schedule - Learning Curves #12

lisa-schneckenreiter · 2024-09-10T09:21:43Z

Hi!

I was wondering which settings the learning curves in Supplementary B correspond to. In this notebook there is a hint (screenshot below) that suggests you trained only with context lengths of 2048, 16K, 32K and 131K for the indicated number of steps/tokens. Is this correct?

Thank you for your help!

damiano-sg · 2024-09-10T12:32:19Z

Ops, those lines of code shouldn't have been shared, they are just an ugly way we used to convert from steps to tokens to have a more direct comparison between some of the checkpoints. Figure 4 of Supplementary B shows the full training of the Foundation Long model, up until the $2^{17}$ context.

lisa-schneckenreiter · 2024-09-10T14:18:16Z

Thank you for your fast response! Unfortunately, not knowing the exact number of tokens at which context windows were extended makes it difficult to compare models at different stages of training. It would be great if you could share that information. Thanks again!

damiano-sg · 2024-09-12T10:26:46Z

What do you mean? The only model that we share is "Foundation Long" which is the one that was trained up to a $2^{17}$ tokens contexts. We don't share earlier checkpoints trained with shorter contexts. If you tell me more precisely what kind of comparison you plan to do I can help you better. You can send me an email at [email protected] so we can continue the conversation there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Schedule - Learning Curves #12

Training Schedule - Learning Curves #12

lisa-schneckenreiter commented Sep 10, 2024

damiano-sg commented Sep 10, 2024

lisa-schneckenreiter commented Sep 10, 2024

damiano-sg commented Sep 12, 2024

Training Schedule - Learning Curves #12

Training Schedule - Learning Curves #12

Comments

lisa-schneckenreiter commented Sep 10, 2024

damiano-sg commented Sep 10, 2024

lisa-schneckenreiter commented Sep 10, 2024

damiano-sg commented Sep 12, 2024