Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I set the activate='relu', I met the RuntimeError. (Solved) #247

Open
liuwang0713 opened this issue Apr 22, 2022 · 1 comment
Open

Comments

@liuwang0713
Copy link

When I set the activate='relu' in CSPDarknet53.py, line 35, I met the following RuntimeError.
(NV A100, CUDA 11.4, PyTorch 1.10.1) (in other server with different version is ok)

Traceback (most recent call last):
  File "train.py", line 308, in <module>
    Trainer(
  File "train.py", line 196, in train
    loss.backward()
  File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [12, 512, 13, 13]], which is output 0 of ReluBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Finally, I solved the problem by adjust the code
out += residual -> out = out + residual in CSPDarknet53.py, line108.

@argusswift
Copy link
Owner

When I set the activate='relu' in CSPDarknet53.py, line 35, I met the following RuntimeError. (NV A100, CUDA 11.4, PyTorch 1.10.1) (in other server with different version is ok)

Traceback (most recent call last):
  File "train.py", line 308, in <module>
    Trainer(
  File "train.py", line 196, in train
    loss.backward()
  File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [12, 512, 13, 13]], which is output 0 of ReluBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Finally, I solved the problem by adjust the code out += residual -> out = out + residual in CSPDarknet53.py, line108.

ths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants