-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #31
Comments
and my versions: PyTorch version: 1.12.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.23
Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.4.0-210-generic-x86_64-with-glibc2.23
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: NVIDIA TITAN Xp
GPU 1: NVIDIA TITAN Xp
GPU 2: NVIDIA TITAN Xp
GPU 3: NVIDIA TITAN Xp
Nvidia driver version: 465.19.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.0
[pip3] pytorchvideo==0.1.5
[pip3] torch==1.12.0
[pip3] torchvision==0.13.0
[conda] numpy 1.23.0 pypi_0 pypi
[conda] pytorchvideo 0.1.5 pypi_0 pypi
[conda] torch 1.12.0 pypi_0 pypi
[conda] torchvision 0.13.0 pypi_0 pypi |
Now I update my cuda to 11.3, but the result doesn't change |
@yqi19 It seems that the training fails when trying to load the optimizer states. Could you set |
I have the same problem, I tried to set capturable=True flag in [AdamW optimizer] but nothing changed. I received this error: "AssertionError: If capturable=False, state_steps should not be CUDA tensors.". |
Hi, congratulations on your excellent work!
I would really appreciate if you could help me through this.
So I run
and trigger the
auto-resume
mode to continue my last training, and this error occursAnd I am 100% sure that CUDNN is enabled, all gpus are available, nothing wrong happens when I first train this.
And here's a nother problem, do you guys have a clue if the training process is slow?
Thanks sooooo much!
The text was updated successfully, but these errors were encountered: