-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaVA-Video-7B reproduce and Qwen2-VL-7B result #11
Comments
Hi! |
Thanks for your response! I will then run with 64 frames, and also sincerely appreciate it if you have time to provide the QwenVL's result. |
Hi!
If you have more requirements, please let me know, thanks! |
Hi!
|
Thanks for your reply! I also have similar result for Qwen2.5VL + Video-RAG. I am wondering that what is the peak performance for Video-RAG by taking more frames? Since the base model Qwen2VL can achieves 65.7 on MLVU by taking in more frames, do you have any insight that taking in more frames will make Video-RAG be better than the original model's performance? |
Hi! Hope this result can help you. |
Hi, thanks for sharing the code for your excellent work.
I’m curious why the results for
Qwen2-VL-7B+Video-RAG
aren’t reported on the VideoMME or MLVU benchmarks.I tried using this repository and replaced the base model with
Qwen2.5-VL-7B
, but the results were not good.Qwen2.5-VL-7B+Video-RAG
{'count': 21.844660194174757, 'ego': 50.85227272727273, 'findNeedle': 63.38028169014085, 'order': 47.87644787644788, 'plotQA': 63.63636363636363, 'anomaly_reco': 66.0, 'topic_reasoning': 86.31178707224335, 'Acc': 58.647654093836245}
I also tried to reproduce the result of
LLaVA-Video-7B+Video-RAG
, but it only achieves 67.29 while the performance reported in paper is 72.4.{'plotQA': 72.35621521335807, 'ego': 60.22727272727273, 'findNeedle': 76.61971830985915, 'anomaly_reco': 71.5, 'topic_reasoning': 87.83269961977186, 'count': 38.83495145631068, 'order': 52.123552123552116, 'Acc': 67.2953081876725}
Here is the code for LLaVA-Video-7B+Video-RAG, I have made no changes to the official code and only write a dataloader.
Could you help with reproducing LLaVA-Video-7B result? And is it possible for you to provide an official result for
Qwen2-VL-7B+Video-RAG
, orQwen2.5-VL-7B+Video-RAG
?Thanks very much!
The text was updated successfully, but these errors were encountered: