This paper dives into sources of non-determinism in machine learning frameworks: https://arxiv.org/abs/2104.07651
It would be a great reference for our paragraph that starts
One specific reproducibility pitfall that is often missed in applying deep learning is the default use of non-deterministic algorithms by CUDA/CuDNN backends when using GPUs.
I could add it now or wait until the next version if we're still considering this content frozen for submission.