You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering which settings the learning curves in Supplementary B correspond to. In this notebook there is a hint (screenshot below) that suggests you trained only with context lengths of 2048, 16K, 32K and 131K for the indicated number of steps/tokens. Is this correct?
Thank you for your help!
The text was updated successfully, but these errors were encountered:
Ops, those lines of code shouldn't have been shared, they are just an ugly way we used to convert from steps to tokens to have a more direct comparison between some of the checkpoints. Figure 4 of Supplementary B shows the full training of the Foundation Long model, up until the $2^{17}$ context.
Thank you for your fast response! Unfortunately, not knowing the exact number of tokens at which context windows were extended makes it difficult to compare models at different stages of training. It would be great if you could share that information. Thanks again!
What do you mean? The only model that we share is "Foundation Long" which is the one that was trained up to a $2^{17}$ tokens contexts. We don't share earlier checkpoints trained with shorter contexts. If you tell me more precisely what kind of comparison you plan to do I can help you better. You can send me an email at [email protected] so we can continue the conversation there.
Hi!
I was wondering which settings the learning curves in Supplementary B correspond to. In this notebook there is a hint (screenshot below) that suggests you trained only with context lengths of 2048, 16K, 32K and 131K for the indicated number of steps/tokens. Is this correct?
Thank you for your help!
The text was updated successfully, but these errors were encountered: