Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log softmax vs softmax #87

Open
harish-kamath opened this issue Feb 11, 2024 · 1 comment
Open

Log softmax vs softmax #87

harish-kamath opened this issue Feb 11, 2024 · 1 comment

Comments

@harish-kamath
Copy link

self.kl_loss = self.kl_lossf(F.log_softmax(logits, dim=-1), F.softmax(teacher_logits, dim=-1))

Why use log softmax on the model logits, but softmax on the teacher logits?

@jpc
Copy link
Contributor

jpc commented Feb 11, 2024

You can configure the PyTorch loss function to take log of targets or just targets. By default the targets are not in log-space and so this is what I used. There may be numerical stability benefits but honestly I don’t remember if there was some rationale behind this.

There are examples of both in the docs: https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants