-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Template Multinode GPU Error #566
Labels
bug
Something isn't working
Comments
12 tasks
6 tasks
While #722 produces a proper resource request for sbatch, it fails to work correctly:
produces:
When I use this instead:
mpirun is able to launch hoomd, but somehow SLURM_LOCALID is 0 on all ranks.... I will troubleshoot that further when testing the solution #777. In the meantime, multi-GPU jobs are still a bug on Great Lakes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
For the GreatLakes (and potentially others) template. Multi-node GPU submissions incorrectly set the
--ntasks-per-node
to be the total number of tasks disregarding individual node size.To Reproduce
Just request a multi-GPU node submission with
--pretend
and view the output. Here is an exampleGiven the request that
nranks=4
andngpu=8
, this should be--ntasks-per-node=2
as there are 2 GPUs per node for the GPU cluster of GreatLakes.This problem may exist in other environments and was propagated to #561, so we should check other templates for this logical error.
The text was updated successfully, but these errors were encountered: