You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I use n_accumulated_grads with a value bigger than 1 with batch norm layers, the batch normalization is computed only over the micro-batches and not the whole batch. This could cause problems with training stability and validation results.
I think that batch norm layers should be treated specially to compute the mean and variance over the whole batch finally. I think this is a very hard problem and I don't have a solution. Maybe it would be good to look at how they're doing it in the Pytorch-lightning library.
The text was updated successfully, but these errors were encountered:
matteocao
changed the title
[BUG] Accumulate gradients is not compatible with BatchNorm
Accumulate gradients is not compatible with BatchNorm
May 8, 2022
When I use n_accumulated_grads with a value bigger than 1 with batch norm layers, the batch normalization is computed only over the micro-batches and not the whole batch. This could cause problems with training stability and validation results.
I think that batch norm layers should be treated specially to compute the mean and variance over the whole batch finally. I think this is a very hard problem and I don't have a solution. Maybe it would be good to look at how they're doing it in the Pytorch-lightning library.
The text was updated successfully, but these errors were encountered: