RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py #1

chiendoanngoc · 2022-01-08T10:09:26Z

Thanks for your great work, your code is so much cleaner that I could easily understand.
I just had an error raised in train.py when loss.backward(). The error is [RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED.
Have you ever seen this before and do you have any suggestion to fix this? Thanks a lot!

amlarraz · 2022-01-10T16:00:56Z

Hi @chiendoanngoc ! You're welcome! I've faced the same issue and I fixed that by using another version of PyTorch. Actually I'm using version: 1.9.0+cu111 however it depends on your CUDA version. You can find all previous pytorch versions here

I just changed the README file to avoid confusion about the Pytorch version.

mk-hassan · 2022-07-02T18:42:53Z

HELLO, @amlarraz @chiendoanngoc I had changed the torch version to 1.9.0+cu111 but I still got the same error. I used Colab as working environment.

  cpuset_checked))
Logdir: ./logs/combination-2_7_2022-18h40m33s
Train epoch: 1:   0%|          | 0/1113 [00:00<?, ?it/s]/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-1-13d8a8766d4b>](https://localhost:8080/#) in <module>()
     60         loss = criterion(pred_3, pred_canny, pred_1, pred_2, msk, canny_label)
     61         loss = loss/accumulation_steps
---> 62         loss.backward()
     63         # accumulative gradient
     64         if (i + 1) % accumulation_steps == 0:  # Wait for several backward steps

1 frames
[/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py](https://localhost:8080/#) in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    147     Variable._execution_engine.run_backward(
    148         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 149         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    150 
    151 

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED```

import torch
torch.version

1.9.0+cu111

amlarraz · 2022-07-04T07:17:57Z

Hi @Twixii99, which CUDA version are you using? Remember that the PyTorch version depends on the CUDA version you're using. Ifyou're using this PyTorch version and the colab enviroment is using a different CUDA version than 11.1 PyTorch will give you some errors. To know which CUDA version you're using you can run the command: !nvidia-smi in one cell. To choose the correct PyTorch version according with your CUDA version you can visit this page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py #1

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py #1

chiendoanngoc commented Jan 8, 2022

amlarraz commented Jan 10, 2022 •

edited

Loading

mk-hassan commented Jul 2, 2022 •

edited

Loading

amlarraz commented Jul 4, 2022

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py #1

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED when loss.backward() in train.py #1

Comments

chiendoanngoc commented Jan 8, 2022

amlarraz commented Jan 10, 2022 • edited Loading

mk-hassan commented Jul 2, 2022 • edited Loading

amlarraz commented Jul 4, 2022

amlarraz commented Jan 10, 2022 •

edited

Loading

mk-hassan commented Jul 2, 2022 •

edited

Loading