Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-node AllReduce tree/ring algo switching fix for MI300 #1624

Closed

Conversation

mustafabar
Copy link
Contributor

@mustafabar mustafabar commented Apr 2, 2025

Details

Do not mention proprietary info or link to internal work items in this PR.

Work item: Internal

What were the changes?
Adjust tree/ring correction factor to address the AllReduce Tree->Ring switching point for large messages (above 64 MB) and get a performance boost up to 1.25x for the impacted message sizes

Why were the changes made?
Suboptimal tree usage for larger messages where ring works better

How was the outcome achieved?
Tune correction factors for MI300x and use topo_explorer to mimic the real runs

Additional Documentation:

Approval Checklist

Do not approve until these items are satisfied.

  • Verify the CHANGELOG has been updated, if
    • there are any NCCL API version changes,
    • any changes impact library users, and/or
    • any changes impact any other ROCm library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant