Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: operation does not have an identity #123

Open
mengxia1994 opened this issue Oct 16, 2022 · 8 comments
Open

RuntimeError: operation does not have an identity #123

mengxia1994 opened this issue Oct 16, 2022 · 8 comments
Labels
feature request new features good first issue Good for newcomers help wanted Extra attention is needed lane detection

Comments

@mengxia1994
Copy link

mengxia1994 commented Oct 16, 2022

I meet it when i try to train lstr.
Loading targets into memory...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2037/2037 [00:01<00:00, 1385.99it/s]
[1, 202] training loss: 32.5229
[1, 202] loss label: 0.7416
[1, 202] loss curve: 2.3253
[1, 202] loss upper: 0.1422
[1, 202] loss lower: 0.7414
[1, 202] training loss aux0: 16.9041
[1, 202] loss label aux0: 0.7343
[1, 202] loss curve aux0: 2.5827
[1, 202] loss upper aux0: 0.1302
[1, 202] loss lower aux0: 0.7638
[1, 405] training loss: 18.8799
[1, 405] loss label: 0.6991
[1, 405] loss curve: 1.3303
[1, 405] loss upper: 0.0926
[1, 405] loss lower: 0.2203
[1, 405] training loss aux0: 9.5051
[1, 405] loss label aux0: 0.7070
[1, 405] loss curve aux0: 1.3525
[1, 405] loss upper aux0: 0.0942
[1, 405] loss lower aux0: 0.2166
[1, 608] training loss: 15.3990
[1, 608] loss label: 0.6896
[1, 608] loss curve: 1.0405
[1, 608] loss upper: 0.0851
[1, 608] loss lower: 0.1731
[1, 608] training loss aux0: 7.6113
[1, 608] loss label aux0: 0.6941
[1, 608] loss curve aux0: 1.0002
[1, 608] loss upper aux0: 0.0848
[1, 608] loss lower aux0: 0.1792
Traceback (most recent call last):
File "main_landet.py", line 65, in
runner.run()
File "/home/mengxia/pytorch-auto-drive/utils/runners/lane_det_trainer.py", line 55, in run
self.model)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/mengxia/pytorch-auto-drive/utils/losses/hungarian_loss.py", line 124, in forward
loss, log_dict = self.calc_full_loss(outputs=outputs, targets=targets)
File "/home/mengxia/pytorch-auto-drive/utils/losses/hungarian_loss.py", line 136, in calc_full_loss
indices = self.matcher(outputs=outputs, targets=targets)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/mengxia/pytorch-auto-drive/utils/losses/hungarian_loss.py", line 71, in forward
norm_weights, valid_points = lane_normalize_in_batch(target_keypoints) # G, G x N
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/mengxia/pytorch-auto-drive/utils/losses/hungarian_loss.py", line 24, in lane_normalize_in_batch
norm_weights /= norm_weights.max()
RuntimeError: operation does not have an identity.

@mengxia1994
Copy link
Author

sometimes it come out the error at the begining, sometimes it comes out after several iter like above. I refer to #76. Maybe a same problem. I checked the dataset but found nothing. I'm using custom dataset organized in tusimple type and adjust the input size to (540, 960). So far I have sucessfully trained all the algrithm except lstr, need help~~

@voldemortX
Copy link
Owner

@mengxia1994 Do you have many no-lane images in your dataset?

@mengxia1994
Copy link
Author

@mengxia1994 Do you have many no-lane images in your dataset?

I also find the problem. It is not actually no lane. A few of them only have 2 or 3 points(others are -2) . However, after transfered to txt, it appears to be 0 0 0 0 0 0(which i believe is because the main direction is left-right and the lane is short). I will delete these cases and have a try.

@voldemortX
Copy link
Owner

@mengxia1994 Do you have many no-lane images in your dataset?

I also find the problem. It is not actually no lane. A few of them only have 2 or 3 points(others are -2) . However, after transfered to txt, it appears to be 0 0 0 0 0 0(which i believe is because the main direction is left-right and the lane is short). I will delete these cases and have a try.

Good luck! Do tell me if the issue persists.

@mengxia1994
Copy link
Author

@mengxia1994 Do you have many no-lane images in your dataset?

I also find the problem. It is not actually no lane. A few of them only have 2 or 3 points(others are -2) . However, after transfered to txt, it appears to be 0 0 0 0 0 0(which i believe is because the main direction is left-right and the lane is short). I will delete these cases and have a try.

Good luck! Do tell me if the issue persists.

It workes. Thanks for your help!
By the way, I have some questions:
1 are you use the default configs to get the best result as Benchmark showed?
In my trial experience, almost all the lr are too big, cannot convergence. While the default lstr lr and scnn lr are so different, I think it is not set casually. Despite I am using custom data, the dataset size is similar to tusimple. And I think the distribution in lane detection project(scene) are similar compared to other deeplearning missions.
Just want some advices to train and adjust hyper parameter cause of lack of time and machines~~
2 I haven't found examples for image augmentation in all configs. Is it implemented?
3 How can i add val part during training? How can i print more information like acc/ f1 during training? Can I save more checkpoint models, not just the last one? Sometimes the last several model are very easy to overfit.

@voldemortX
Copy link
Owner

@mengxia1994

  1. We set learning rate and others based on validation set performance. While TuSimple can be a rather curious dataset, the best lr may be off. You could try some lower lr that is frequently used among all configs. Remember lr should be scaled according to batch size (usually a linear relationship, bigger bs, higher lr).

  2. augs are independently implemented in this repo. You can find aug configs in configs/datasets/, and you can search for corresponding codes by class names.

  3. we have a val-num-steps arg for checkpoint selection. However, that is not supported for lane detection since often times a lane det network performs best (on final test set) at the end. We could really use a checkpointing option though, if you would care to add it yourself, shouldn't be very complex.

@mengxia1994
Copy link
Author

@mengxia1994

1. We set learning rate and others based on validation set performance. While TuSimple can be a rather curious dataset, the best lr may be off. You could try some lower lr that is frequently used among all configs. Remember lr should be scaled according to batch size (usually a linear relationship, bigger bs, higher lr).

2. augs are independently implemented in this repo.  You can find aug configs in `configs/datasets/`, and you can search for corresponding codes by class names.

3. we have a `val-num-steps` arg for checkpoint selection. However, that is not supported for lane detection since often times a lane det network performs best (on final test set) at the end. We could really use a checkpointing option though, if you would care to add it yourself, shouldn't be very complex.

OK I will try~Thanks for your help!

@voldemortX
Copy link
Owner

lane network online validation currently use seg iou as metric, don't really show much.

@voldemortX voldemortX added feature request new features good first issue Good for newcomers help wanted Extra attention is needed lane detection labels Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request new features good first issue Good for newcomers help wanted Extra attention is needed lane detection
Projects
None yet
Development

No branches or pull requests

2 participants