Open
Description
Description
For the GreatLakes (and potentially others) template. Multi-node GPU submissions incorrectly set the --ntasks-per-node
to be the total number of tasks disregarding individual node size.
To Reproduce
Just request a multi-GPU node submission with --pretend
and view the output. Here is an example
#SBATCH --job-name="TempProject/42b7b4f2921788ea14dac5566e6f06d0/foo/13ee8c7cb17a11b218fe41a3e31afab3"
#SBATCH --partition=gpu
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --gpus=8
Given the request that nranks=4
and ngpu=8
, this should be --ntasks-per-node=2
as there are 2 GPUs per node for the GPU cluster of GreatLakes.
This problem may exist in other environments and was propagated to #561, so we should check other templates for this logical error.