Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-2 pretrain loss. #313

Open
jiangix-paper opened this issue Feb 8, 2023 · 1 comment
Open

GPT-2 pretrain loss. #313

jiangix-paper opened this issue Feb 8, 2023 · 1 comment

Comments

@jiangix-paper
Copy link

Hello, Thanks for your great work. I want to know how to calculate the loss given the raw text. For example:
I have a sample in training data: " I want to go to school". When I input the string into the GPT-2 model, every output logits has a loss value. So the total loss is the sum of all output logits loss?
image

@imostafizur
Copy link

No, the aggregate of all output logits loss is not the overall loss. The loss function is usually defined in GPT-2 and other neural network models to calculate the difference between the goal output and the predicted output. Cross-entropy loss or mean squared error metrics are frequently used to quantify this difference.
The GPT-2 algorithm outputs a series of tokens after receiving the phrase "I want to go to school" as input. Given the context of the input, each character in the output has a corresponding probability distribution (logits) that shows how likely each potential token is.
You would match the probability distribution of each generated token to the associated goal token in the training data to calculate the loss for this output. In order to adjust the model's parameters during training, the loss function would quantify the difference between each token's expected and real ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants