What is a reasonable interval of the losses at 400k and 600k steps for the default ~124M model of the repo on OWT?
My val loss is around 2.95 at 400k and 2.90 at 600k steps.
Should I be concerned that I might be doing something wrong, or is this reasonable?
I am asking because the plotted val loss is around 2.905 in Karpathy's run at 400k steps.