如何在 VLLM 部署后使用 API 直接传递视频进行推理？ #742

XyWzzZ · 2025-02-08T08:54:23Z

我在使用 VLLM 部署 Qwen2-VL-72B-Instruct，并尝试通过 API 进行视频推理。目前的 API 调用方式如下：

`openai_api_key = "None"
openai_api_base = "http://xxxxxx/v1"

client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
model="Qwen2-VL-72B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "video_url", "video_url": {"url": "https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4"}},
{"type": "text", "text": "请描述这个视频的具体过程"},
]},
]
)
`
问题
1、目前 API 允许传递 video_url，但是否支持直接上传视频文件而不依赖外部 URL？
2、如果不支持，是否有官方推荐的方式来直接传递本地视频文件进行推理，而不是手动拆帧？
3、在 VLLM 部署环境下，是否需要特定的配置或参数才能使 API 识别并处理视频输入？
感谢解答！🙏

948024326 · 2025-02-10T01:35:49Z

同问+1

gymbeijing · 2025-02-10T05:54:56Z

你好，可以问下你的Python版本，Pytorch版本和CUDA版本吗？我也正在部署VLLM，但遇到版本不匹配。

python -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1 with CUDA 1201 (you have 2.5.1+cu121)
Python 3.10.15 (you have 3.10.16)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details

wulipc · 2025-02-12T10:48:10Z

Yes, you can pass the local video file by: file:// + your local absolute path, for example:

video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4"   #  file:// + your local absolute path
video_url_for_remote = "https://url/path/to/v_3l7quTy4c2s.mp4"

video_url = video_url_for_remote # or video_url_for_local
video_message = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
            {"type": "text", "text": "Could you go into detail about the content of this video?"},
            {"type": "video", "video": video_url, "total_pixels": 20480 * 28 * 28, "min_pixels": 16 * 28 * 28},
        ]
    }
]

More video example, please see: Video inference in https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#using---transformers-to-chat

XyWzzZ · 2025-02-13T08:19:14Z

Yes, you can pass the local video file by: file:// + your local absolute path, for example:

video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4" # file:// + your local absolute path
video_url_for_remote = "https://url/path/to/v_3l7quTy4c2s.mp4"

video_url = video_url_for_remote # or video_url_for_local
video_message = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "text", "text": "Could you go into detail about the content of this video?"},
{"type": "video", "video": video_url, "total_pixels": 20480 * 28 * 28, "min_pixels": 16 * 28 * 28},
]
}
]
More video example, please see: Video inference in https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#using---transformers-to-chat

Hello, thank you for your answer, but I still haven't succeeded after trying. I am using vllm and openai api, and the error message is as follows:

Traceback (most recent call last):
File "/home/xueyanwen/Qwen/test_qwen_video.py", line 24, in
chat_response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_utils/_utils.py", line 279, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 859, in create
return self._post(
^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1283, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 960, in request
return self._request(
^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1049, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1098, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1049, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1098, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1064, in _request
raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Internal Server Error
(base) [xueyanwen@A2-H3C-R5300-0201-0304-xx Qwen]$ python test_qwen_video.py
Traceback (most recent call last):
File "/home/xueyanwen/Qwen/test_qwen_video.py", line 25, in
chat_response = client.chat.completions.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_utils/_utils.py", line 279, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 859, in create
return self._post(
^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1283, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 960, in request
return self._request(
^^^^^^^^^^^^^^
File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1064, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "6 validation errors for ValidatorIterator\n0.typed-dict.text\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'text' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error\n0.typed-dict.image_url\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'image_url' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error\n0.typed-dict.input_audio\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'input_audio' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error", 'type': 'BadRequestError', 'param': None, 'code': 400}

XyWzzZ · 2025-02-13T08:23:59Z

你好，可以问下你的Python版本，Pytorch版本和CUDA版本吗？我也正在部署VLLM，但遇到版本不匹配。

python -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1 with CUDA 1201 (you have 2.5.1+cu121)
Python 3.10.15 (you have 3.10.16)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details

Python 3.10.16 torch 2.5.1+cu124 cuda 12.4

wulipc · 2025-02-13T09:24:14Z

你好，可以问下你的Python版本，Pytorch版本和CUDA版本吗？我也正在部署VLLM，但遇到版本不匹配。

python -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1 with CUDA 1201 (you have 2.5.1+cu121)
Python 3.10.15 (you have 3.10.16)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details

看起来你是没有安装 flash attention 导致 backend 选择了 xformers，建议使用 flash attention 更高效，你可以通过下面代码自测：

from transformers.utils import is_flash_attn_2_available
print(is_flash_attn_2_available())

wulipc · 2025-02-13T09:27:44Z

Yes, you can pass the local video file by: file:// + your local absolute path, for example:
video_url_for_local = "file:///your/local/path/to/v_3l7quTy4c2s.mp4" # file:// + your local absolute path
video_url_for_remote = "https://url/path/to/v_3l7quTy4c2s.mp4"
video_url = video_url_for_remote # or video_url_for_local
video_message = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "text", "text": "Could you go into detail about the content of this video?"},
{"type": "video", "video": video_url, "total_pixels": 20480 * 28 * 28, "min_pixels": 16 * 28 * 28},
]
}
]
More video example, please see: Video inference in https://github.com/QwenLM/Qwen2.5-VL?tab=readme-ov-file#using---transformers-to-chat

Hello, thank you for your answer, but I still haven't succeeded after trying. I am using vllm and openai api, and the error message is as follows:

Traceback (most recent call last): File "/home/xueyanwen/Qwen/test_qwen_video.py", line 24, in chat_response = client.chat.completions.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_utils/_utils.py", line 279, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 859, in create return self._post( ^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1283, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 960, in request return self._request( ^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1049, in _request return self._retry_request( ^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1098, in _retry_request return self._request( ^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1049, in _request return self._retry_request( ^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1098, in _retry_request return self._request( ^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1064, in _request raise self._make_status_error_from_response(err.response) from None openai.InternalServerError: Internal Server Error (base) [xueyanwen@A2-H3C-R5300-0201-0304-xx Qwen]$ python test_qwen_video.py Traceback (most recent call last): File "/home/xueyanwen/Qwen/test_qwen_video.py", line 25, in chat_response = client.chat.completions.create( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_utils/_utils.py", line 279, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 859, in create return self._post( ^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1283, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 960, in request return self._request( ^^^^^^^^^^^^^^ File "/home/xueyanwen/miniconda3/lib/python3.12/site-packages/openai/_base_client.py", line 1064, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "6 validation errors for ValidatorIterator\n0.typed-dict.text\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'text' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error\n0.typed-dict.image_url\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'image_url' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error\n0.typed-dict.input_audio\n Field required [type=missing, input_value={'type': 'video', 'video'...n/video/狼来了.mp4'}}, input_type=dict]\n For further information visit https://errors.pydantic.dev/2.10/v/missing\n0.typed-dict.type\n Input should be 'input_audio' [type=literal_error, input_value='video', input_type=str]\n For further information visit https://errors.pydantic.dev/2.10/v/literal_error", 'type': 'BadRequestError', 'param': None, 'code': 400}

看起来是 message 写的有问题，建议先尝试 ReadMe 里面的 message 进行测试；祝好

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何在 VLLM 部署后使用 API 直接传递视频进行推理？ #742

如何在 VLLM 部署后使用 API 直接传递视频进行推理？ #742

XyWzzZ commented Feb 8, 2025

948024326 commented Feb 10, 2025

gymbeijing commented Feb 10, 2025 •

edited

Loading

wulipc commented Feb 12, 2025

XyWzzZ commented Feb 13, 2025

XyWzzZ commented Feb 13, 2025

wulipc commented Feb 13, 2025

wulipc commented Feb 13, 2025

如何在 VLLM 部署后使用 API 直接传递视频进行推理？ #742

如何在 VLLM 部署后使用 API 直接传递视频进行推理？ #742

Comments

XyWzzZ commented Feb 8, 2025

948024326 commented Feb 10, 2025

gymbeijing commented Feb 10, 2025 • edited Loading

wulipc commented Feb 12, 2025

XyWzzZ commented Feb 13, 2025

XyWzzZ commented Feb 13, 2025

wulipc commented Feb 13, 2025

wulipc commented Feb 13, 2025

gymbeijing commented Feb 10, 2025 •

edited

Loading