[ENHANCEMENT] Integrating torch.compile with Megatron/TransformerEngine #1089
yanboliang
started this conversation in
Ideas
Replies: 2 comments
-
Hi, thanks for the issue. Is there anything more specific you'd like to contribute? |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ericharper Thanks for your reply! We have a list of enhancements that could contribute to Megatron/TransformerEngine, mainly focus on integrating
Let me know if you have any question! Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is your feature request related to a problem? Please describe.
torch.compile
which captures computation/communication ops into FX graph and generates optimized execution plan by fusing ops and leveraging computation/communication overlap.torch.compile
, likeFlexAttention
which provided a flexible API that can automatic generates high performance kernels for many attention variants.torch.compile
+ Megatron can unleash even great power at both LLM training and inference space.Describe the solution you'd like
torch.compile
on top of Megatron modules, tensor parallel, context parallel and the underlying TransformerEngine, capture computation/communication graphs, investigate better fusion and computation/communication overlap optimization, etc.FlexAttention
into the Megatron attention module.Additional context
no_torch_dynamo
, which just skipped Dynamo tracing. However, it seems Megatron supportstorch.compile
byjit_fuser = torch.compile
. I'd like to know more context on the discrepancy of if allowingtorch.compile
between these two repros.torch.compile
can provide even more benefit than fusion only, like better fusion and leveraging computation/communication in the distributed setup, etc.torch.compile
with Megatron/TransformerEngine.Beta Was this translation helpful? Give feedback.
All reactions