Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Schedule - Learning Curves #12

Open
lisa-schneckenreiter opened this issue Sep 10, 2024 · 3 comments
Open

Training Schedule - Learning Curves #12

lisa-schneckenreiter opened this issue Sep 10, 2024 · 3 comments

Comments

@lisa-schneckenreiter
Copy link

Hi!

I was wondering which settings the learning curves in Supplementary B correspond to. In this notebook there is a hint (screenshot below) that suggests you trained only with context lengths of 2048, 16K, 32K and 131K for the indicated number of steps/tokens. Is this correct?

Thank you for your help!

image

@damiano-sg
Copy link
Collaborator

Ops, those lines of code shouldn't have been shared, they are just an ugly way we used to convert from steps to tokens to have a more direct comparison between some of the checkpoints. Figure 4 of Supplementary B shows the full training of the Foundation Long model, up until the $2^{17}$ context.

@lisa-schneckenreiter
Copy link
Author

Thank you for your fast response! Unfortunately, not knowing the exact number of tokens at which context windows were extended makes it difficult to compare models at different stages of training. It would be great if you could share that information. Thanks again!

@damiano-sg
Copy link
Collaborator

What do you mean? The only model that we share is "Foundation Long" which is the one that was trained up to a $2^{17}$ tokens contexts. We don't share earlier checkpoints trained with shorter contexts. If you tell me more precisely what kind of comparison you plan to do I can help you better. You can send me an email at [email protected] so we can continue the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants