Actual batch size with multiple GPUs #120

theodore-zhao · 2024-01-13T13:12:47Z

I am training on 4 GPUs, each fits 2 examples. I am setting the training with
TRAIN.BATCH_SIZE_TOTAL 8 \ TRAIN.BATCH_SIZE_PER_GPU 2 \

It seem in the code TRAIN.BATCH_SIZE_PER_GPU actually doesn't matter. The batch size per GPU is determined by TRAIN.BATCH_SIZE_TOTAL divided by # GPUs. Can you confirm that?

I also want to confirm that when computing gradient steps, the effective batch size is 8, not 2 for my setting.

Finally, I found there is a parameter 'GRADIENT_ACCUMULATE_STEP', which is set to be 1 by default. Should I specify this as well if I want to have larger batch size? E.g. 16.

Thanks in advance for the so many questions!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actual batch size with multiple GPUs #120

Actual batch size with multiple GPUs #120

theodore-zhao commented Jan 13, 2024

Actual batch size with multiple GPUs #120

Actual batch size with multiple GPUs #120

Comments

theodore-zhao commented Jan 13, 2024