-
Notifications
You must be signed in to change notification settings - Fork 389
Issues: NVIDIA/TransformerEngine
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] Weight gradients with TransformerEngine v2.1 don't match those with TransformerEngine v1.12
bug
Something isn't working
tp_overlap
#1616
opened Mar 26, 2025 by
okoge-kaz
[BUG] Wrong attention gradient in Transformer Engine
bug
Something isn't working
#1615
opened Mar 26, 2025 by
i-love-megatron
Can we only replace part of nn.Linear with te.Linear and others keep unchanged?
#1595
opened Mar 20, 2025 by
zigzagcai
How to debug Something isn't working
tex.fused_attn_bwd
getting cuDNN Error: [cudnn_frontend] Error: No execution plans support the graph
bug
#1591
opened Mar 19, 2025 by
Ir1d
Does TransformerEngine support FP8 communication such like all-gather or all-to-all?
#1579
opened Mar 14, 2025 by
zigzagcai
Is it necessary to perform layer replacement on te.xx? If not, is it effective to use te.fp8.autocast directly
#1556
opened Mar 11, 2025 by
wangli68
When I import the package ’transformer_engine.pytorch‘, the error message is as follows
#1541
opened Mar 6, 2025 by
wangli68
Causal mask ignored in DotProductAttention
good first issue
Good for newcomers
#1524
opened Feb 28, 2025 by
anthony-Neo
How can we integrate the DeepGEMM Fp8 GEMM implementation in TE's block-wise scaling?
#1509
opened Feb 26, 2025 by
BolongLin
Question about the performace of GroupedLinear
performance
Performance issues
#1499
opened Feb 20, 2025 by
XLzed
Float8Quantizer::create_tensor
calculates scale_inv
instead of creating an empty buffer
performance
#1491
opened Feb 18, 2025 by
yaox12
qwen1.5-0.5B failed to save model with huggingface transformers
bug
Something isn't working
#1482
opened Feb 13, 2025 by
xinpengzz
When flash-attn >2.6.1, use context parallel will cause error
#1467
opened Feb 9, 2025 by
south-ocean
HF Accelerate FP8 use more gpu memory then FP16 in training LLM
#1429
opened Jan 28, 2025 by
Liufeiran123
Previous Next
ProTip!
Updated in the last three days: updated:>2025-03-23.