RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

yqwu94 · 2020-11-25T13:54:55Z

Hi, I met a cuda runtime error as following:
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22)
Recently, I am studying normalizing flow, such as Glow, however, a strange svd problem has arisen when I try to train Glow from scratch. In my opinion, due to Glow contains “tensor.slogdet()” operation in affine coupling layer, it may involve SVD decomposition, and thus casue above problem.
Specifically, I first use a small learning rate, such as 1e-6, the training loss begins to fall slowly. However, when the learning rate reaches 0.0004, the training loss has a sudden rise (inf) and the error information is presented as above.
How can I avoid this error during training process of Glow？

kamenbliznashki · 2020-12-31T16:20:40Z

Hi - when you say 'when the learning rate reaches 0.0004', it sounds like you are increasing the learning rate during training. Is that what you are doing or are you starting training with a new learning rate and keeping it fixed for the duration of training? What dataset are you using?

…

On Wed, Nov 25, 2020 at 8:55 AM yqwu94 ***@***.***> wrote: Hi, I met a cuda runtime error as following: RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) Recently, I am studying normalizing flow, such as Glow, however, a strange svd problem has arisen when I try to train Glow from scratch. In my opinion, due to Glow contains “tensor.slogdet()” operation in affine coupling layer, it may involve SVD decomposition, and thus casue above problem. Specifically, I first use a small learning rate, such as 1e-6, the training loss begins to fall slowly. However, when the learning rate reaches 0.0004, the training loss has a sudden rise (inf) and the error information is presented as above. How can I avoid this error during training process of Glow？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#11>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG3JPP6SIOMM37F4QWF5EVDSRUED5ANCNFSM4UCNXIRQ> .

Naagar · 2021-01-05T16:30:08Z

Hi,
I'm also facing a similar problem.
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 11)

Dataset: mnist
torchvision 0.8.2
python 3.8.5
PyTorch 1.6.0
module load cudnn/7-cuda-10.0
model: Glow

" python -m torch.distributed.launch --nproc_per_node=3
flow_main.py --train
--distributed
--dataset=mnist
--n_levels=3
--depth=32
--width=512
--batch_size=16
--generate
--n_epochs=10 \ "

Error

File "flow_main.py", line 489, in train_epoch
loss.backward()
File "/home/sandeep.nagar/anaconda3/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/sandeep.nagar/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/sandeep.nagar/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/sandeep.nagar/anaconda3/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 11)

pandya6988 · 2021-10-29T14:38:52Z

Any updates on this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

yqwu94 commented Nov 25, 2020

kamenbliznashki commented Dec 31, 2020 via email

Naagar commented Jan 5, 2021

pandya6988 commented Oct 29, 2021

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11

Comments

yqwu94 commented Nov 25, 2020

kamenbliznashki commented Dec 31, 2020 via email

Naagar commented Jan 5, 2021

Error

pandya6988 commented Oct 29, 2021