[Bug]: Error processing images with Qwen3-VL

### Your current environment

Error processing images with Qwen3-VL

### 🐛 Describe the bug

I am using vllm with two 5070ti video cards and the qwe3-vl-4b-instruct model, and after approximately 10 sequential requests to process images, the service crashes and with some responses it repeats the same thing infinitely.

My docker-compose:
services:
  vllm-server:
    image: dockerhub.timeweb.cloud/vllm/vllm-openai:latest
    container_name: vllm-server
    ports:
      - "8000:8000"
    volumes:
      - /home/notires/huggingface:/root/.local/share/vllm
      - ./cache:/root/.cache/huggingface
    environment:
      - CUDA_VISIBLE_DEVICES=0,1
      - TOKENIZERS_PARALLELISM=false
      - HF_HOME=/root/.cache/huggingface
    command:
      - --model=/root/.local/share/vllm/qwen3-vl-4b-instruct
      - --served-model-name=qwen3-vl
      - --host=0.0.0.0
      - --port=8000
      - --tensor-parallel-size=2
      - --download-dir=/root/.local/share/vllm
      - --max-num-seqs=128
      - --gpu-memory-utilization=0.85
      - --max-model-len=16384
      - --enable-chunked-prefill
      - --enable-prefix-caching
      - --trust-remote-code
      - --mm-encoder-tp-mode=data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]
    restart: unless-stopped


ERROR
vllm-server  | (APIServer pid=1) INFO:     172.19.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 200 OK
vllm-server  | (Worker_TP1 pid=117) INFO 11-04 11:30:20 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-server  | (Worker_TP0 pid=116) INFO 11-04 11:30:20 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [core_client.py:564] Engine core proc EngineCore_DP0 died unexpectedly, shutting down client.
vllm-server  | (Worker_TP1 pid=117) INFO 11-04 11:30:20 [multiproc_executor.py:599] WorkerProc shutting down.
vllm-server  | (Worker_TP0 pid=116) INFO 11-04 11:30:20 [multiproc_executor.py:599] WorkerProc shutting down.
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] AsyncLLM output_handler failed.
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] Traceback (most recent call last):
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480]     outputs = await engine_core.get_output_async()
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480]     raise self._format_exception(outputs) from None
vllm-server  | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
vllm-server  | (APIServer pid=1) INFO:     172.19.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
vllm-server  | (APIServer pid=1) INFO:     172.19.0.1:35672 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
vllm-server  | (APIServer pid=1) INFO:     Shutting down
vllm-server  | (APIServer pid=1) INFO:     Waiting for application shutdown.
vllm-server  | (APIServer pid=1) INFO:     Application shutdown complete.
vllm-server  | (APIServer pid=1) INFO:     Finished server process [1]
vllm-server  | /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
vllm-server  |   warnings.warn('resource_tracker: There appear to be %d '
vllm-server exited with code 0 (restarting)
vllm-server  | INFO 11-04 11:30:30 [__init__.py:216] Automatically detected platform cuda.
vllm-server  | (APIServer pid=1) INFO 11-04 11:30:31 [api_server.py:1839] vLLM API server version 0.11.0ble Watch
vllm-server  | (APIServer pid=1) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
vllm-server  | (APIServer pid=1) INFO 11-04 11:30:31 [utils.py:233] non-default args: {'host': '0.0.0.0', 'model': '/root/.local/share/vllm/qwen3-vl-4b-instruct', 'trust_remote_code': True, 'max_model_len': 16384, 'served_model_name': ['qwen3-vl'], 'download_dir': '/root/.local/share/vllm', 'tensor_parallel_size': 2, 'gpu_memory_utilization': 0.85, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data', 'max_num_seqs': 128, 'enable_chunked_prefill': True}


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Error processing images with Qwen3-VL #28063

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Error processing images with Qwen3-VL #28063

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions