Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: "BEVFusion: 'encoders.camera.backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table'" #597

Open
fdy61 opened this issue Feb 13, 2024 · 11 comments

Comments

@fdy61
Copy link

fdy61 commented Feb 13, 2024

I use my own lidar-only and camera-only pth to train the fusion model,and encountered this problem,How can I solve it?
image

@xmutyjs
Copy link

xmutyjs commented Apr 26, 2024

Pls did you solve it, I also had the same problem

@kongwah
Copy link

kongwah commented May 28, 2024

I faced the same problem, when trying to use the new saved checkpoints from training a cam-only centerhead detector.
I am not sure if it has anything to do with the centerhead being different the transfusion.

Let me be specific.

  1. I first train a new cam-only detector using the following:

torchpack dist-run -np 1 python tools/train.py
configs/nuscenes/det/centerhead/lssfpn/camera/256x704/swint/default.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint pretained/swint-nuimages-pretrained.pth

  1. the checkpoints of (1) are saved into runs/ folder

  2. Then now I try to train a cam+lidar detector using one of the saved checkpoints in (2):

torchpack dist-run -np 1 python tools/train.py \ configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint runs/run-531bf67d-d3138be2/epoch_20.pth
--load_from pretrained/lidar-only-det.pth

And the errors are as follow:

Traceback (most recent call last):
File "tools/train.py", line 68, in main
model = build_model(cfg.model)
File "/home/bevfusion/mmdet3d/models/builder.py", line 41, in build_model
return build_fusion_model(cfg, train_cfg=train_cfg, test_cfg=test_cfg)
File "/home/bevfusion/mmdet3d/models/builder.py", line 35, in build_fusion_model
return FUSIONMODELS.build(
File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/opt/conda/lib/python3.8/site-packages/mmcv/utils/registry.py", line 55, in build_from_cfg
raise type(e)(f'{obj_cls.name}: {e}')
KeyError: "BEVFusion: 'encoders.camera.backbone.stages.0.blocks.0.attn.w_msa.relative_position_bias_table'"

@fdy61
Copy link
Author

fdy61 commented May 28, 2024

The model you trained in (1) means you obtain a single-modality model which just uses the camera images to do the object detection. That means you cannot use it as the camera backbone to train the fusion model.

@kongwah
Copy link

kongwah commented May 28, 2024

Hi fdy61,

I notice the same "pretained/swint-nuimages-pretrained.pth" was used as a checkpoint for training the cam-only detector, and also as a checkpoint to train the cam+lidar detector.

This is why I was of the impression that the saved cam-only checkpoints "runs/run-531bf67d-d3138be2/epoch_20.pth" will be usable to train the cam+lidar detector.

May I have your kind advise, how then can I train an appropriate camera backbone to train the cam+lidar fusion model?

Thanks

@fdy61
Copy link
Author

fdy61 commented May 28, 2024

pretained/swint-nuimages-pretrained.pth is just a image backbone model, if I remember correctly,which has the same structure compared with SwinTransformer. But runs/run-531bf67d-d3138be2/epoch_20.pth is an entire model, and its' weights parameter have been changed completely. you can print out the model weights name to see.

@kongwah
Copy link

kongwah commented May 28, 2024

Hi fdy61,

Thanks for your tips. Indeed when I "ls -l" the 2 files, they are very different:

ls -l pretrained/swint-nuimages-pretrained.pth run/run-531bf67d-d3138be2/epoch_20.pth
-rw-r--r-- 1 root root 110370759 Sep 26 2022 pretrained/swint-nuimages-pretrained.pth
-rw-r--r-- 1 root root 523728374 May 27 03:45 run/run-531bf67d-d3138be2_epoch_20.pth

I am not sure how else can I print examine their "model weight name". Can you kindly advise me?

So the question then becomes how do I re-train the cam+lidar fusion model, using my new image data? I believe you may have the same goal, when you mentioned in your first post "I use my own lidar-only and camera-only pth to train the fusion model....". You also have the camera-only pth checkpoint? Did you manage to solve it? If so, can you advise me please?

Thanks

@kongwah
Copy link

kongwah commented May 29, 2024

Can I do the following instead:

torchpack dist-run -np 8
python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
--load_from runs/run-531bf67d-d3138be2/epoch_20.pth
--load_from pretrained/lidar-only-det.pth

That is, have 2 "--load_from", loading both the cam-only checkpoint and the lidar-only checkpoint.

Thanks

@fdy61
Copy link
Author

fdy61 commented May 29, 2024

torchpack dist-run -np 8
python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
--load_from pretrained/lidar-only-det.pth

@kongwah
Copy link

kongwah commented May 29, 2024

After some debugging, it seems that the issue is that the state_dict keys for

  pretrained/swint-nuimages-pretrained.pth 

differ from that of

runs/run-531bf67d-d3138be2/epoch_20.pth 

by the prefix "encoders.camera.backbone".

I am hoping that if I change the key to skip this prefix, then this error will go away!

@fdy61
Copy link
Author

fdy61 commented May 29, 2024

You just use the pretrained/swint-nuimages-pretrained.pth and it's done.
I have told you that runs/run-531bf67d-d3138be2/epoch_20.pth is the camera-only model, which has been trained only based on images and state_dict keys totally different from pretrained/swint-nuimages-pretrained.pth.

@kongwah
Copy link

kongwah commented May 29, 2024

Hi fdy61,
Thanks.
The background is that I have trained the camera-only model,
and using the "runs/run-531bf67d-d3138be2/epoch_20.pth" checkpoint,
I have obtained mAP improvements over the "pretrained/camera-only-det.pth"

Hence, my thinking is to use this "epoch_20.pth" checkpoint to train the cam+lidar fusion model.

OK, I will try to check/confirm if the state_dict keys of "epoch_20.pth" are totally different from the "camera-only-det.pth", or if they only differ by the prefix "encoders.camera.backbone". I will post the update here.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants