Skip to content

Template Multinode GPU Error #566

Open
@b-butler

Description

@b-butler

Description

For the GreatLakes (and potentially others) template. Multi-node GPU submissions incorrectly set the --ntasks-per-node to be the total number of tasks disregarding individual node size.

To Reproduce

Just request a multi-GPU node submission with --pretend and view the output. Here is an example

#SBATCH --job-name="TempProject/42b7b4f2921788ea14dac5566e6f06d0/foo/13ee8c7cb17a11b218fe41a3e31afab3"
#SBATCH --partition=gpu
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --gpus=8

Given the request that nranks=4 and ngpu=8, this should be --ntasks-per-node=2 as there are 2 GPUs per node for the GPU cluster of GreatLakes.

This problem may exist in other environments and was propagated to #561, so we should check other templates for this logical error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions