-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22) #11
Comments
Hi - when you say 'when the learning rate reaches 0.0004', it sounds like
you are increasing the learning rate during training. Is that what you are
doing or are you starting training with a new learning rate and keeping it
fixed for the duration of training? What dataset are you using?
…On Wed, Nov 25, 2020 at 8:55 AM yqwu94 ***@***.***> wrote:
Hi, I met a cuda runtime error as following:
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge
(error: 22)
Recently, I am studying normalizing flow, such as Glow, however, a strange
svd problem has arisen when I try to train Glow from scratch. In my
opinion, due to Glow contains “tensor.slogdet()” operation in affine
coupling layer, it may involve SVD decomposition, and thus casue above
problem.
Specifically, I first use a small learning rate, such as 1e-6, the
training loss begins to fall slowly. However, when the learning rate
reaches 0.0004, the training loss has a sudden rise (inf) and the error
information is presented as above.
How can I avoid this error during training process of Glow?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#11>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AG3JPP6SIOMM37F4QWF5EVDSRUED5ANCNFSM4UCNXIRQ>
.
|
Hi, Dataset: mnist " python -m torch.distributed.launch --nproc_per_node=3 ErrorFile "flow_main.py", line 489, in train_epoch |
Any updates on this issue? |
Hi, I met a cuda runtime error as following:
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 22)
Recently, I am studying normalizing flow, such as Glow, however, a strange svd problem has arisen when I try to train Glow from scratch. In my opinion, due to Glow contains “tensor.slogdet()” operation in affine coupling layer, it may involve SVD decomposition, and thus casue above problem.
Specifically, I first use a small learning rate, such as 1e-6, the training loss begins to fall slowly. However, when the learning rate reaches 0.0004, the training loss has a sudden rise (inf) and the error information is presented as above.
How can I avoid this error during training process of Glow?
The text was updated successfully, but these errors were encountered: