Distributed improvement of Muon implementation #7808

tianshijing · 2025-04-22T09:38:51Z

What does this PR do?

The distributed training of Muon was carefully considered.

Distributed Training Support: Added gradient synchronization via reduce_scatter_tensor and parameter updates via all_gather_into_tensor for proper distributed training.
Performance Optimization: Implemented communication-computation overlap with asynchronous operations when enabled (overlap_comm=True).
3. Memory Efficiency: Only allocates communication buffers in distributed mode and uses gradient sharding to minimize memory usage.
4. Robustness: Enhanced error handling with assertions for 2D parameters and better None gradient management.
5. Backward Compatibility: Maintains original functionality for non-distributed cases while adding distributed capabilities.
Fixes # (issue)

Before submitting

[✅] Did you read the contributor guideline?
Did you write any new necessary tests?

tpoisonooo · 2025-05-13T09:28:36Z

src/llamafactory/train/trainer_utils.py

    logger.info_rank0(
-        f"Using Muon optimizer with {len(muon_params)} Muon params and {len(adamw_params)} AdamW params."
+        f"Using Muon optimizer with {len(muon_params)} Muon params and {len(adamw_params)} AdamW params. "


看起来重复了。

Distributed improvement of Muon implementation

6015b9a

hiyouga mentioned this pull request May 13, 2025

[question] Dose Muon Optimizer support deepspeed ? #8041

Open

1 task

tpoisonooo reviewed May 13, 2025

View reviewed changes

hiyouga added the pending This problem is yet to be addressed label Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed improvement of Muon implementation #7808

Distributed improvement of Muon implementation #7808

tianshijing commented Apr 22, 2025

Uh oh!

tpoisonooo May 13, 2025

Uh oh!

Uh oh!

Distributed improvement of Muon implementation #7808

Are you sure you want to change the base?

Distributed improvement of Muon implementation #7808

Conversation

tianshijing commented Apr 22, 2025

What does this PR do?

Before submitting

Uh oh!

tpoisonooo May 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!