You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training on 4 GPUs, each fits 2 examples. I am setting the training with TRAIN.BATCH_SIZE_TOTAL 8 \ TRAIN.BATCH_SIZE_PER_GPU 2 \
It seem in the code TRAIN.BATCH_SIZE_PER_GPU actually doesn't matter. The batch size per GPU is determined by TRAIN.BATCH_SIZE_TOTAL divided by # GPUs. Can you confirm that?
I also want to confirm that when computing gradient steps, the effective batch size is 8, not 2 for my setting.
Finally, I found there is a parameter 'GRADIENT_ACCUMULATE_STEP', which is set to be 1 by default. Should I specify this as well if I want to have larger batch size? E.g. 16.
Thanks in advance for the so many questions!
The text was updated successfully, but these errors were encountered:
I am training on 4 GPUs, each fits 2 examples. I am setting the training with
TRAIN.BATCH_SIZE_TOTAL 8 \ TRAIN.BATCH_SIZE_PER_GPU 2 \
It seem in the code
TRAIN.BATCH_SIZE_PER_GPU
actually doesn't matter. The batch size per GPU is determined byTRAIN.BATCH_SIZE_TOTAL
divided by # GPUs. Can you confirm that?I also want to confirm that when computing gradient steps, the effective batch size is 8, not 2 for my setting.
Finally, I found there is a parameter 'GRADIENT_ACCUMULATE_STEP', which is set to be 1 by default. Should I specify this as well if I want to have larger batch size? E.g. 16.
Thanks in advance for the so many questions!
The text was updated successfully, but these errors were encountered: