Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to achieve base detector weight pretrained by ILSVRC2015 and ILSVRC #103

Open
yuyangyangji opened this issue Oct 15, 2024 · 5 comments

Comments

@yuyangyangji
Copy link

thanks for the great work of yolov and yolov++, in this repo, you mention how to fine-tune the model from a ptr-trained yolox model. However, i find that in the paper, base detector is trained with ILSVRC2015 and ILSVRC dataset, i wonder whether this repo provide the code for us to achieve the pre-trainined weight? thanks, hope for your answering!

@YuHengsss
Copy link
Owner

Thanks for your interst in our work. We indeed provide code to train the base detector. Take the imagenet vid as an example, the experiment file to train a base detector (e.g. yolox-s) could be found here:

class Exp(MyExp):

You could use tools/train.py and the experiment file to train the base detector. Their usage is same as YOLOX.

@yuyangyangji
Copy link
Author

yuyangyangji commented Oct 16, 2024

thanks! for the training stage, first stage is to start with coco pretrained weights,freeze backbone, only fine-tune linear projection layers in YOLOX prediction head using sampled ILSVRC2015 and ILSVRC dataset. second stage is to use full ILSVRC2015, freeze backbone and fine-tune prediction head and newly added video object classification branch and the FAM. Does this description has bias with the origin paper? I wonder weather the first stage training need to train FAM module and new added classification branch, hope for your answering! thanks!

@YuHengsss
Copy link
Owner

Hello, you need to finetune all the coco pretrained weights in the first stage, NOT only the linear projection head. The procedure of the second stage is correct.

@yuyangyangji
Copy link
Author

hello, now i am successfully train the yolov++ and i have some question about feature select module, in the paper you mentioned that we use a threshold to pick up which proposals to be select to apply FAM module, and the number each frame is always under 100 per frame, however, when i am training in second stage mention above(v++ base decoupledreg_2x version), i find that if we use default setting of repo, the number of proposal we chose is exactly very high, i observe that more than 70% of the proposals are selected, is it within expectation? how about if i directly use the proposals selected by sim-ota per frame? hope for your answering! thanks!

@YuHengsss
Copy link
Owner

This phenomenon is intriguing. The number of candidates selected by the feature selection module depends on both the quality of the base detector and the characteristics of the image. Could you provide more details about the dataset you are using? With such a large number of candidates, the GPU memory cost will be extremely high and expect to meet a OOM error.
Additionally, the average proposal number reported in Table 2 of our paper represents an average value, not the minimum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants