Skip to content

[Bug]: Error processing images with Qwen3-VL #28063

@carlosdcuba1

Description

@carlosdcuba1

Your current environment

Error processing images with Qwen3-VL

🐛 Describe the bug

I am using vllm with two 5070ti video cards and the qwe3-vl-4b-instruct model, and after approximately 10 sequential requests to process images, the service crashes and with some responses it repeats the same thing infinitely.

My docker-compose:
services:
vllm-server:
image: dockerhub.timeweb.cloud/vllm/vllm-openai:latest
container_name: vllm-server
ports:
- "8000:8000"
volumes:
- /home/notires/huggingface:/root/.local/share/vllm
- ./cache:/root/.cache/huggingface
environment:
- CUDA_VISIBLE_DEVICES=0,1
- TOKENIZERS_PARALLELISM=false
- HF_HOME=/root/.cache/huggingface
command:
- --model=/root/.local/share/vllm/qwen3-vl-4b-instruct
- --served-model-name=qwen3-vl
- --host=0.0.0.0
- --port=8000
- --tensor-parallel-size=2
- --download-dir=/root/.local/share/vllm
- --max-num-seqs=128
- --gpu-memory-utilization=0.85
- --max-model-len=16384
- --enable-chunked-prefill
- --enable-prefix-caching
- --trust-remote-code
- --mm-encoder-tp-mode=data
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 2
capabilities: [gpu]
restart: unless-stopped

ERROR
vllm-server | (APIServer pid=1) INFO: 172.19.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 200 OK
vllm-server | (Worker_TP1 pid=117) INFO 11-04 11:30:20 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-server | (Worker_TP0 pid=116) INFO 11-04 11:30:20 [multiproc_executor.py:558] Parent process exited, terminating worker
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [core_client.py:564] Engine core proc EngineCore_DP0 died unexpectedly, shutting down client.
vllm-server | (Worker_TP1 pid=117) INFO 11-04 11:30:20 [multiproc_executor.py:599] WorkerProc shutting down.
vllm-server | (Worker_TP0 pid=116) INFO 11-04 11:30:20 [multiproc_executor.py:599] WorkerProc shutting down.
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] AsyncLLM output_handler failed.
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] Traceback (most recent call last):
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] outputs = await engine_core.get_output_async()
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] raise self._format_exception(outputs) from None
vllm-server | (APIServer pid=1) ERROR 11-04 11:30:20 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
vllm-server | (APIServer pid=1) INFO: 172.19.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
vllm-server | (APIServer pid=1) INFO: 172.19.0.1:35672 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
vllm-server | (APIServer pid=1) INFO: Shutting down
vllm-server | (APIServer pid=1) INFO: Waiting for application shutdown.
vllm-server | (APIServer pid=1) INFO: Application shutdown complete.
vllm-server | (APIServer pid=1) INFO: Finished server process [1]
vllm-server | /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
vllm-server | warnings.warn('resource_tracker: There appear to be %d '
vllm-server exited with code 0 (restarting)
vllm-server | INFO 11-04 11:30:30 [init.py:216] Automatically detected platform cuda.
vllm-server | (APIServer pid=1) INFO 11-04 11:30:31 [api_server.py:1839] vLLM API server version 0.11.0ble Watch
vllm-server | (APIServer pid=1) The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
vllm-server | (APIServer pid=1) INFO 11-04 11:30:31 [utils.py:233] non-default args: {'host': '0.0.0.0', 'model': '/root/.local/share/vllm/qwen3-vl-4b-instruct', 'trust_remote_code': True, 'max_model_len': 16384, 'served_model_name': ['qwen3-vl'], 'download_dir': '/root/.local/share/vllm', 'tensor_parallel_size': 2, 'gpu_memory_utilization': 0.85, 'enable_prefix_caching': True, 'mm_encoder_tp_mode': 'data', 'max_num_seqs': 128, 'enable_chunked_prefill': True}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions