Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]Is V-ZB still not support overlap_grad_reduce and transformer_engine? #53

Open
LitLeo opened this issue Dec 5, 2024 · 3 comments

Comments

@LitLeo
Copy link

LitLeo commented Dec 5, 2024

overlap_grad_reduce and transformer_engine can both bring significant performance benefits. Are they still not supported?

@QPHutu
Copy link

QPHutu commented Dec 6, 2024

transformer_engine is not supported because it's not accessible to split backward pass into weight gradient and activation gradient.

For overlap_grad_reduce, I think it's already well supported. Please provide more information if you found any issue. Thanks.

@veritas9872
Copy link

veritas9872 commented Feb 20, 2025

Hello! @QPHutu,

May I ask which parts of Transformer Engine are causing issues?

This would be very helpful for applying Zero Bubble Pipeline Parallel to the current version of Megatron.

Thanks!

@QPHutu
Copy link

QPHutu commented Feb 21, 2025

Hello! @QPHutu,

May I ask which parts of Transformer Engine are causing issues?

This would be very helpful for applying Zero Bubble Pipeline Parallel to the current version of Megatron.

Thanks!

To support Transformer Engine, you may need to change the code of TransformerEngine first, and then build from source code to support splitting backward pass in Megatron. It's doable if you want.

The reason why we don't support transformer engine is not about any technical issue, it's mainly because we don't want to make any code change in dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants