adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors.

Hi, congratulations on your excellent work!
I would really appreciate if you could help me through this.
So I run 
```python
PYTHONWARNINGS="ignore" cvnets-train --common.config-file config/classification/imagenet/mobilevit_v2.yaml --common.results-loc mobilevitv2_results/width_1_0_0 --common.override-kwargs scheduler.cosine.max_lr=0.0075 scheduler.cosine.min_lr=0.00075 optim.weight_decay=0.013 model.classification.mitv2.width_multiplier=1.00 --common.tensorboard-logging --common.accum-freq 4 --common.auto-resume 
```
and trigger the `auto-resume` mode to continue my last training, and this error occurs
```python
2022-07-03 06:06:18 - LOGS    - Exception occurred that interrupted the training. If capturable=False, state_steps shou
ld not be CUDA tensors.
If capturable=False, state_steps should not be CUDA tensors.

Traceback (most recent call last):                                                                           
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 682, in run
    train_loss, train_ckpt_metric = self.train_epoch(epoch)
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 353, in train_epoch
    self.gradient_scalar.step(optimizer=self.optimizer)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 338, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 285, in _may
be_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorat
e_context
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 161, in step
    adamw(params_with_grad,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 218, in adamw
    func(params,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 259, in _single_tenso
r_adamw
    assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
```
And I am 100% sure that CUDNN is enabled, all gpus are available, nothing wrong happens when I first train this.

And here's a nother problem, do you guys have a clue if the training process is slow? 
Thanks sooooo much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors. #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions