Log softmax vs softmax #87

harish-kamath · 2024-02-11T07:13:51Z

Line 344 in 80b268b

 self.kl_loss = self.kl_lossf(F.log_softmax(logits, dim=-1), F.softmax(teacher_logits, dim=-1)) 

Why use log softmax on the model logits, but softmax on the teacher logits?

jpc · 2024-02-11T12:35:47Z

You can configure the PyTorch loss function to take log of targets or just targets. By default the targets are not in log-space and so this is what I used. There may be numerical stability benefits but honestly I don’t remember if there was some rationale behind this.

There are examples of both in the docs: https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log softmax vs softmax #87

Log softmax vs softmax #87

harish-kamath commented Feb 11, 2024

jpc commented Feb 11, 2024

Log softmax vs softmax #87

Log softmax vs softmax #87

Comments

harish-kamath commented Feb 11, 2024

jpc commented Feb 11, 2024