-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base Allreduce Algorithm Selection/Performance Issue #12550
Labels
Comments
The decision to call another algorithm under certain conditions is out of scope in a base function, because technically it shall be done by a higher level decision function (such as |
@mjwilkins18 Curious if you could try out #12552 and see if that helps? |
@wenduwan Apologies, yes that PR fixes my issue, thanks! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background information
I was investigating an Allreduce performance issue for 64k processes @ 32kB message size.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.6 (reproduced w/ v5.0.3)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from tarball
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.N/A
Please describe the system on which you are running
Details of the problem
I traced it back to this conditional in the redscat_allgather algorithm:
ompi/ompi/mca/coll/base/coll_base_allreduce.c
Lines 993 to 1000 in 11e1d8e
The default dtype of the microbenchmark (OSU) is MPI_CHAR, so
count
= 32kB, and it triggers this conditional. As a result, it falls back to the basic linear algorithm, which performs very poorly at this larger scale. Because this code is in the redscat_allgather function, the algorithm change is triggered even if I force the algorithm selection using the environment variable.I am happy to open a PR with a fix, but I not sure what the preferred solution is. I understand the need to take care regarding non-commutative operations, but the first half of the
if
that is causing my problem seems like it should be removed and replicated in the default algorithm selection logic (if at all). Otherwise, could we add an additional requirement that the count is below some value, to avoid scaling issues?The text was updated successfully, but these errors were encountered: