推理qwen2-vl-7b, vllm模式失败，到底应该用什么版本？

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

**scripts:**
  # 76GiB
  CUDA_VISIBLE_DEVICES=0,1,2,3 \
  MAX_PIXELS=401408 \
  swift infer \
      --model /home/data/gouqi/ms-swift/swift-qwen2vl-7b/v4-20250108-174040/checkpoint-2382 \
      --max_batch_size 64 \
      --infer_backend vllm \
      --val_dataset '/home/data/gouqi/ms-swift/swift/llm/dataset/data/4k_bad_test_new.json' \
      --template qwen2_vl \
      --max_length 2048 \
      --stream true \
      --temperature 0 \
      --top_k 1 \
      --max_new_tokens 300

**error:**
<img width="1001" alt="image" src="https://github.com/user-attachments/assets/9faa725c-fddb-4a81-bd9c-2937c3fc2434" />



**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

<img width="725" alt="image" src="https://github.com/user-attachments/assets/3b212a6d-c729-4c86-b50e-edfc951958a9" />
<img width="729" alt="image" src="https://github.com/user-attachments/assets/a04a2d22-06d7-485e-a4c3-5dd54be2898b" />

**Additional context**
Add any other context about the problem here(在这里补充其他信息)

试了vllm几个版本（0.6.0，0.6.6）都不行，没有看到应该用哪个版本，还有vllm 推理和 default推理模式 都不能batch 推理吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

推理qwen2-vl-7b, vllm模式失败，到底应该用什么版本？ #2890

76GiB

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

推理qwen2-vl-7b, vllm模式失败，到底应该用什么版本？ #2890

Description

76GiB

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions