We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 58fd2bc commit ce71d59Copy full SHA for ce71d59
src/liger_kernel/ops/fused_linear_cross_entropy.py
@@ -97,7 +97,7 @@ def fused_linear_cross_entropy_forward(
97
98
# gradient of logits_chunk is computed in-place by the above triton kernel.
99
# Following HuggingFace model source code, we do the forward and backward
100
- # w.r.t. logits in fp32 for numerical stability especially as the num classes (vocab size) os huge.
+ # w.r.t. logits in fp32 for numerical stability especially as the num classes (vocab size) is huge.
101
# (reference: https://github.com/huggingface/transformers/blob/v4.42.4/src/transformers/models/llama/modeling_llama.py#L1194)
102
# Propagating to lm_head's backward, we'll switch back to the original dtype.
103
logits_chunk = logits_chunk.to(dtype)
0 commit comments