Skip to content

Loss Distribution at 400k and 600k steps - OWT - 124M model #644

@MarkoKarbevski

Description

@MarkoKarbevski

What is a reasonable interval of the losses at 400k and 600k steps for the default ~124M model of the repo on OWT?

My val loss is around 2.95 at 400k and 2.90 at 600k steps.
Should I be concerned that I might be doing something wrong, or is this reasonable?

I am asking because the plotted val loss is around 2.905 in Karpathy's run at 400k steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions