Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to start multi-gpus training in a single machine #89

Open
kevinhuangxf opened this issue Jan 26, 2025 · 1 comment
Open

How to start multi-gpus training in a single machine #89

kevinhuangxf opened this issue Jan 26, 2025 · 1 comment

Comments

@kevinhuangxf
Copy link

Thanks for the excellent work!

I encounter a problem of how to start multi-gpu training. I have 8 gpus but each I ran the training command line I can only start one GPU training:

Image

Image

I use this command:

python -m src.main +experiment=re10k data_loader.train.batch_size=14

Does it mean even I train on single node with multiple GPUs, I still need to use slurm to run multi gpus training?

@donydchen
Copy link
Owner

Hi @kevinhuangxf, thanks for your appreciation. Normally, the current setting should automatically utilize all available GPUs for training. I'm not sure what might be causing this issue. You could try explicitly specifying the training devices to use all GPUs by following the instructions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants