Accumulate gradients is not compatible with BatchNorm #75

raphaelreinauer · 2022-05-05T10:45:42Z

When I use n_accumulated_grads with a value bigger than 1 with batch norm layers, the batch normalization is computed only over the micro-batches and not the whole batch. This could cause problems with training stability and validation results.

I think that batch norm layers should be treated specially to compute the mean and variance over the whole batch finally. I think this is a very hard problem and I don't have a solution. Maybe it would be good to look at how they're doing it in the Pytorch-lightning library.

raphaelreinauer · 2022-09-30T13:50:51Z

This is no longer relevant.

raphaelreinauer added the bug Something isn't working label May 5, 2022

matteocao added enhancement New feature or request and removed bug Something isn't working labels May 8, 2022

matteocao changed the title ~~[BUG] Accumulate gradients is not compatible with BatchNorm~~ Accumulate gradients is not compatible with BatchNorm May 8, 2022

matteocao assigned raphaelreinauer May 8, 2022

matteocao added this to the Giotto-deep release milestone May 8, 2022

matteocao added the low priority This issue is a nice to have label May 8, 2022

raphaelreinauer closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accumulate gradients is not compatible with BatchNorm #75

Accumulate gradients is not compatible with BatchNorm #75

raphaelreinauer commented May 5, 2022

raphaelreinauer commented Sep 30, 2022

Accumulate gradients is not compatible with BatchNorm #75

Accumulate gradients is not compatible with BatchNorm #75

Comments

raphaelreinauer commented May 5, 2022

raphaelreinauer commented Sep 30, 2022