-
Notifications
You must be signed in to change notification settings - Fork 392
Open
Description
Issue Description
Following some issues in the Qwen2.5-VL official repo,
- Experimental Details on Videos QwenLM/Qwen3-VL#925 (comment)
- Prompts and evaluation process for Video Grounding Tasks on the technical report QwenLM/Qwen3-VL#837
I've tried the default setting for running results for Qwen2.5-VL-7B on the Charades-STA benchmark.
However, I'm getting significantly lower results than reported in the paper.
Expected Results
- Paper reports: 43.6 mIoU
Actual Results
- My evaluation: 29.46 mIoU
- With fps 2: 29.78 mIoU
This is a substantial gap of ~14 mIoU.
Setup Details
- Model: Qwen2.5-VL-7B (labeled as 'temporal_grounding_charades' in lmms-eval)
- Benchmark: Charades-STA
- Evaluation tool: lmms-eval with default settings
- script (same as default):
accelerate launch --num_processes=8 --main_process_port=12346 -m lmms_eval \
--model qwen2_5_vl \
--model_args=pretrained=Qwen/Qwen2.5-VL-7B-Instruct,max_pixels=12845056,attn_implementation=flash_attention_2,interleave_visuals=False \
--tasks temporal_grounding_charades \
--batch_size 1 \
Questions
- Is there any implementation details that I've been missing?
Metadata
Metadata
Assignees
Labels
No labels