Qwen2.5-VL-7B Charades-STA results significantly lower than reported (29.46 vs 43.6 mIoU)

## Issue Description

Following some issues in the Qwen2.5-VL official repo, 
- https://github.com/QwenLM/Qwen3-VL/issues/925#issuecomment-2716599307
- https://github.com/QwenLM/Qwen3-VL/issues/837
I've tried the default setting for running results for Qwen2.5-VL-7B on the Charades-STA benchmark.
However, I'm getting significantly lower results than reported in the paper.

## Expected Results
- **Paper reports:** 43.6 mIoU

## Actual Results
- **My evaluation:** 29.46 mIoU 
- **With fps 2:** 29.78 mIoU

This is a substantial gap of ~14 mIoU.

## Setup Details
- **Model:** Qwen2.5-VL-7B (labeled as 'temporal_grounding_charades' in lmms-eval)
- **Benchmark:** Charades-STA
- **Evaluation tool:** lmms-eval with default settings
- **script (same as default):**
```
accelerate launch --num_processes=8 --main_process_port=12346 -m lmms_eval \
    --model qwen2_5_vl \
    --model_args=pretrained=Qwen/Qwen2.5-VL-7B-Instruct,max_pixels=12845056,attn_implementation=flash_attention_2,interleave_visuals=False \
    --tasks temporal_grounding_charades \
    --batch_size 1 \
```

## Questions
1. Is there any implementation details that I've been missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen2.5-VL-7B Charades-STA results significantly lower than reported (29.46 vs 43.6 mIoU) #857

Issue Description

Expected Results

Actual Results

Setup Details

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen2.5-VL-7B Charades-STA results significantly lower than reported (29.46 vs 43.6 mIoU) #857

Description

Issue Description

Expected Results

Actual Results

Setup Details

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions